972:
59:
986:
for the strings, and then finding the deepest internal nodes which have leaf nodes from all the strings in the subtree below it. The figure on the right is the suffix tree for the strings "ABAB", "BABA" and "ABBA", padded with unique string terminators, to become "ABAB$ 0", "BABA$ 1" and "ABBA$ 2".
678:
1019:
time (if the size of the alphabet is constant). If the tree is traversed from the bottom up with a bit vector telling which strings are seen below each node, the k-common substring problem can be solved in
758:
594:
66:
The picture shows two strings where the problem has multiple solutions. Although the substring occurrences always overlap, it is impossible to obtain a longer common substring by "uniting" them.
957:
849:
898:
267:
385:
317:
1285:
1439:
1050:
794:
1083:
1017:
69:
The strings "ABABC", "BABCA" and "ABCBA" have only one longest common substring, viz. "ABC" of length 3. Other common substrings are "A", "AB", "B", "BA", "BC" and "C".
353:
534:
474:
1317:
1179:
506:
453:
433:
405:
198:
178:
158:
138:
118:
98:
1369:
Charalampopoulos, Panagiotis; Kociumaka, Tomasz; Pissis, Solon P.; Radoszewski, Jakub (Aug 2021). Mutzel, Petra; Pagh, Rasmus; Herman, Grzegorz (eds.).
1714:
1609:
599:
1444:
1648:
1574:
1327:
Store only non-zero values in the rows. This can be done using hash-tables instead of arrays. This is useful for large alphabets.
1373:. European Symposium on Algorithms. Leibniz International Proceedings in Informatics (LIPIcs). Vol. 204. Schloss Dagstuhl.
1579:
683:
1584:
539:
987:
The nodes representing "A", "B", "AB" and "BA" all have descendant leaves from all of the strings, numbered 0, 1 and 2.
1795:
1569:
1396:
1663:
1604:
1476:
1449:
1406:
1511:
456:
1093:
The following pseudocode finds the set of longest common substrings between two strings with dynamic programming:
907:
799:
854:
1691:
214:
1337:
1831:
1826:
1696:
1551:
1775:
1501:
1780:
1658:
1625:
1561:
20:
1790:
1686:
1630:
1531:
1485:
35:
1589:
1521:
358:
272:
1322:
The last and current row can be stored on the same 1D array by traversing the inner loop backwards
1246:
1734:
1023:
983:
975:
960:
767:
509:
1059:
993:
1053:
322:
1739:
1681:
1541:
1469:
1131:
L > z z := L ret := {S}
519:
459:
1546:
1290:
1152:
479:
47:
42:
of all of them. There may be more than one longest common substring. Applications include
8:
1744:
901:
761:
1424:
62:
The strings "BADANAT" and "CANADAS" share the maximal-length substrings "ADA" and "ANA".
1673:
1640:
438:
418:
390:
183:
163:
143:
123:
103:
83:
43:
1398:
Algorithms on
Strings, Trees and Sequences: Computer Science and Computational Biology
1805:
1402:
1224:, which is the last character of the longest common substring (of size z) instead of
415:
One can find the lengths and starting positions of the longest common substrings of
1800:
1770:
1724:
1526:
1462:
1374:
27:
1239:
The following tricks can be used to reduce the memory usage of an implementation:
1653:
1594:
1506:
1208:
is used to hold the length of the longest common substring found so far. The set
1706:
1599:
1379:
1450:
Suffix Tree based C implementation of
Longest common substring for two strings
1820:
1516:
1493:
1429:
982:
The longest common substrings of a set of strings can be found by building a
971:
1719:
1536:
1434:
1368:
1729:
58:
1425:
Dictionary of
Algorithms and Data Structures: longest common substring
1440:
Dynamic programming implementations in various languages on wikibooks
39:
1243:
Keep only the last and current row of the DP table to save memory (
513:
1765:
1228:. Thus all the longest common substrings would be, for each i in
673:{\textstyle O\left((n+m)\log \sigma /{\sqrt {\log(n+m)}}\right)}
1445:
working AS3 implementation of the dynamic programming algorithm
1342:
1185:
stores the length of the longest common suffix of the prefixes
978:
for the strings "ABAB", "BABA" and "ABBA", numbered 0, 1 and 2.
1138:L = z ret := ret âȘ {S}
1749:
1430:
Perl/XS implementation of the dynamic programming algorithm
1454:
753:{\displaystyle O\left((n+m)\log \sigma /\log(n+m)\right)}
387:, a longest string which occurs as substring of at least
1212:
is used to hold the set of strings which are of length
1052:
time. If the suffix tree is prepared for constant time
1123:j = 1 L := 1
602:
325:
1293:
1249:
1155:
1062:
1026:
996:
910:
857:
802:
770:
686:
589:{\displaystyle 2^{o\left({\sqrt {\log(n+m)}}\right)}}
542:
522:
482:
462:
441:
421:
393:
361:
275:
217:
186:
166:
146:
126:
106:
86:
1103:(1..r, 1..n) z := 0 ret := {}
1435:
Perl/XS implementation of the suffix tree algorithm
1220:can be saved efficiently by just storing the index
160:, find a longest string which is substring of both
1311:
1279:
1173:
1077:
1044:
1011:
951:
892:
843:
788:
752:
672:
588:
528:
500:
468:
447:
427:
399:
379:
347:
311:
261:
192:
172:
152:
132:
112:
92:
1818:
1256:
796:. The solutions to the generalized problem take
1371:Faster Algorithms for Longest Common Substring
1470:
952:{\displaystyle \Theta (n_{1}+\cdots +n_{K})}
844:{\displaystyle \Theta (n_{1}+\cdots +n_{K})}
512:. A faster algorithm can be achieved in the
256:
224:
1099:LongestCommonSubstring(S, T) L :=
1477:
1463:
893:{\displaystyle \Theta (n_{1}\cdots n_{K})}
1378:
262:{\displaystyle S=\{S_{1},\ldots ,S_{K}\}}
72:ABABC ||| BABCA ||| ABCBA
1649:Comparison of regular-expression engines
1394:
1348:, all the possible substrings of length
970:
596:. In particular, this algorithm runs in
57:
1819:
1088:
1610:ZhuâTakaoka string matching algorithm
1458:
1362:
75:
1388:
34:of two or more strings is a longest
1575:BoyerâMoore string-search algorithm
1401:. USA: Cambridge University Press.
13:
1063:
1027:
997:
911:
858:
803:
771:
463:
14:
1843:
1664:Nondeterministic finite automaton
1605:Two-way string-matching algorithm
1418:
516:model of computation if the size
1127:L := L + 1
1056:retrieval, it can be solved in
990:Building the suffix tree takes
1580:BoyerâMooreâHorspool algorithm
1570:ApostolicoâGiancarlo algorithm
1352:that are contained in a string
1306:
1297:
1274:
1271:
1259:
1253:
1168:
1159:
1072:
1066:
1039:
1030:
1006:
1000:
966:
946:
914:
887:
861:
838:
806:
783:
774:
760:space. Solving the problem by
742:
730:
707:
695:
660:
648:
623:
611:
575:
563:
495:
483:
292:
277:
1:
1356:
1338:Longest palindromic substring
1204:, respectively. The variable
410:
380:{\displaystyle 2\leq k\leq K}
312:{\displaystyle |S_{i}|=n_{i}}
1585:KnuthâMorrisâPratt algorithm
1512:DamerauâLevenshtein distance
1280:{\displaystyle O(\min(r,n))}
536:of the input alphabet is in
7:
1776:Compressed pattern matching
1502:Approximate string matching
1484:
1331:
1111:j := 1..n
1045:{\displaystyle \Theta (NK)}
789:{\displaystyle \Theta (nm)}
211:. Given the set of strings
53:
10:
1848:
1781:Longest common subsequence
1692:NeedlemanâWunsch algorithm
1562:String-searching algorithm
1380:10.4230/LIPIcs.ESA.2021.30
1078:{\displaystyle \Theta (N)}
1012:{\displaystyle \Theta (N)}
21:longest common subsequence
18:
1791:Sequential pattern mining
1758:
1705:
1672:
1639:
1631:Commentz-Walter algorithm
1619:Multiple string searching
1618:
1560:
1552:WagnerâFischer algorithm
1492:
348:{\textstyle \sum n_{i}=N}
1801:String rewriting systems
1786:Longest common substring
1697:SmithâWaterman algorithm
1522:Gestalt pattern matching
1385:Here: Theorem 1, p.30:2.
508:time with the help of a
209:common substring problem
203:A generalization is the
32:longest common substring
19:Not to be confused with
16:Computer science problem
1735:Generalized suffix tree
1659:Thompson's construction
1395:Gusfield, Dan (1999) .
1149:This algorithm runs in
1107:i := 1..r
984:generalized suffix tree
976:Generalized suffix tree
961:generalized suffix tree
529:{\displaystyle \sigma }
510:generalized suffix tree
469:{\displaystyle \Theta }
1687:Hirschberg's algorithm
1313:
1281:
1175:
1115:S = T
1079:
1054:lowest common ancestor
1046:
1013:
979:
953:
894:
845:
790:
754:
674:
590:
530:
502:
470:
449:
429:
401:
381:
349:
313:
263:
194:
174:
154:
134:
114:
94:
63:
1542:Levenshtein automaton
1532:JaroâWinkler distance
1314:
1312:{\displaystyle O(nr)}
1282:
1176:
1174:{\displaystyle O(nr)}
1080:
1047:
1014:
974:
954:
895:
846:
791:
755:
675:
591:
531:
503:
501:{\displaystyle (n+m)}
471:
450:
430:
402:
382:
350:
314:
264:
195:
175:
155:
135:
115:
95:
61:
1590:RabinâKarp algorithm
1547:Levenshtein distance
1291:
1247:
1153:
1060:
1024:
994:
908:
855:
800:
768:
684:
600:
540:
520:
480:
460:
439:
419:
391:
359:
323:
273:
215:
184:
164:
144:
124:
104:
84:
48:plagiarism detection
1832:Dynamic programming
1827:Problems on strings
1745:Ternary search tree
1089:Dynamic programming
902:dynamic programming
762:dynamic programming
80:Given two strings,
1674:Sequence alignment
1641:Regular expression
1309:
1277:
1171:
1075:
1042:
1009:
980:
949:
890:
841:
786:
750:
670:
586:
526:
498:
466:
445:
425:
397:
377:
345:
309:
259:
190:
170:
150:
130:
110:
90:
76:Problem definition
64:
44:data deduplication
1814:
1813:
1806:String operations
663:
578:
448:{\displaystyle T}
428:{\displaystyle S}
400:{\displaystyle k}
193:{\displaystyle T}
173:{\displaystyle S}
153:{\displaystyle n}
133:{\displaystyle T}
113:{\displaystyle m}
93:{\displaystyle S}
1839:
1771:Pattern matching
1725:Suffix automaton
1527:Hamming distance
1479:
1472:
1465:
1456:
1455:
1413:
1412:
1392:
1386:
1384:
1382:
1366:
1318:
1316:
1315:
1310:
1286:
1284:
1283:
1278:
1235:
1231:
1227:
1223:
1219:
1215:
1211:
1207:
1203:
1199:
1192:
1188:
1184:
1181:time. The array
1180:
1178:
1177:
1172:
1142:L := 0
1084:
1082:
1081:
1076:
1051:
1049:
1048:
1043:
1018:
1016:
1015:
1010:
958:
956:
955:
950:
945:
944:
926:
925:
899:
897:
896:
891:
886:
885:
873:
872:
850:
848:
847:
842:
837:
836:
818:
817:
795:
793:
792:
787:
759:
757:
756:
751:
749:
745:
723:
679:
677:
676:
671:
669:
665:
664:
641:
639:
595:
593:
592:
587:
585:
584:
583:
579:
556:
535:
533:
532:
527:
507:
505:
504:
499:
475:
473:
472:
467:
454:
452:
451:
446:
434:
432:
431:
426:
406:
404:
403:
398:
386:
384:
383:
378:
355:. Find for each
354:
352:
351:
346:
338:
337:
318:
316:
315:
310:
308:
307:
295:
290:
289:
280:
268:
266:
265:
260:
255:
254:
236:
235:
199:
197:
196:
191:
179:
177:
176:
171:
159:
157:
156:
151:
139:
137:
136:
131:
119:
117:
116:
111:
99:
97:
96:
91:
28:computer science
1847:
1846:
1842:
1841:
1840:
1838:
1837:
1836:
1817:
1816:
1815:
1810:
1754:
1701:
1668:
1654:Regular grammar
1635:
1614:
1595:Raita algorithm
1556:
1507:Bitap algorithm
1488:
1483:
1421:
1416:
1409:
1393:
1389:
1367:
1363:
1359:
1334:
1292:
1289:
1288:
1248:
1245:
1244:
1233:
1229:
1225:
1221:
1217:
1213:
1209:
1205:
1201:
1197:
1195:end at position
1190:
1186:
1182:
1154:
1151:
1150:
1147:
1091:
1061:
1058:
1057:
1025:
1022:
1021:
995:
992:
991:
969:
940:
936:
921:
917:
909:
906:
905:
881:
877:
868:
864:
856:
853:
852:
832:
828:
813:
809:
801:
798:
797:
769:
766:
765:
719:
694:
690:
685:
682:
681:
640:
635:
610:
606:
601:
598:
597:
555:
551:
547:
543:
541:
538:
537:
521:
518:
517:
481:
478:
477:
461:
458:
457:
440:
437:
436:
420:
417:
416:
413:
392:
389:
388:
360:
357:
356:
333:
329:
324:
321:
320:
303:
299:
291:
285:
281:
276:
274:
271:
270:
250:
246:
231:
227:
216:
213:
212:
185:
182:
181:
165:
162:
161:
145:
142:
141:
125:
122:
121:
105:
102:
101:
85:
82:
81:
78:
73:
56:
24:
17:
12:
11:
5:
1845:
1835:
1834:
1829:
1812:
1811:
1809:
1808:
1803:
1798:
1793:
1788:
1783:
1778:
1773:
1768:
1762:
1760:
1756:
1755:
1753:
1752:
1747:
1742:
1737:
1732:
1727:
1722:
1717:
1711:
1709:
1707:Data structure
1703:
1702:
1700:
1699:
1694:
1689:
1684:
1678:
1676:
1670:
1669:
1667:
1666:
1661:
1656:
1651:
1645:
1643:
1637:
1636:
1634:
1633:
1628:
1622:
1620:
1616:
1615:
1613:
1612:
1607:
1602:
1600:Trigram search
1597:
1592:
1587:
1582:
1577:
1572:
1566:
1564:
1558:
1557:
1555:
1554:
1549:
1544:
1539:
1534:
1529:
1524:
1519:
1514:
1509:
1504:
1498:
1496:
1490:
1489:
1482:
1481:
1474:
1467:
1459:
1453:
1452:
1447:
1442:
1437:
1432:
1427:
1420:
1419:External links
1417:
1415:
1414:
1407:
1387:
1360:
1358:
1355:
1354:
1353:
1340:
1333:
1330:
1329:
1328:
1325:
1324:
1323:
1308:
1305:
1302:
1299:
1296:
1276:
1273:
1270:
1267:
1264:
1261:
1258:
1255:
1252:
1170:
1167:
1164:
1161:
1158:
1095:
1090:
1087:
1074:
1071:
1068:
1065:
1041:
1038:
1035:
1032:
1029:
1008:
1005:
1002:
999:
968:
965:
948:
943:
939:
935:
932:
929:
924:
920:
916:
913:
889:
884:
880:
876:
871:
867:
863:
860:
840:
835:
831:
827:
824:
821:
816:
812:
808:
805:
785:
782:
779:
776:
773:
748:
744:
741:
738:
735:
732:
729:
726:
722:
718:
715:
712:
709:
706:
703:
700:
697:
693:
689:
668:
662:
659:
656:
653:
650:
647:
644:
638:
634:
631:
628:
625:
622:
619:
616:
613:
609:
605:
582:
577:
574:
571:
568:
565:
562:
559:
554:
550:
546:
525:
497:
494:
491:
488:
485:
465:
444:
424:
412:
409:
396:
376:
373:
370:
367:
364:
344:
341:
336:
332:
328:
306:
302:
298:
294:
288:
284:
279:
258:
253:
249:
245:
242:
239:
234:
230:
226:
223:
220:
189:
169:
149:
129:
109:
89:
77:
74:
71:
55:
52:
15:
9:
6:
4:
3:
2:
1844:
1833:
1830:
1828:
1825:
1824:
1822:
1807:
1804:
1802:
1799:
1797:
1794:
1792:
1789:
1787:
1784:
1782:
1779:
1777:
1774:
1772:
1769:
1767:
1764:
1763:
1761:
1757:
1751:
1748:
1746:
1743:
1741:
1738:
1736:
1733:
1731:
1728:
1726:
1723:
1721:
1718:
1716:
1713:
1712:
1710:
1708:
1704:
1698:
1695:
1693:
1690:
1688:
1685:
1683:
1680:
1679:
1677:
1675:
1671:
1665:
1662:
1660:
1657:
1655:
1652:
1650:
1647:
1646:
1644:
1642:
1638:
1632:
1629:
1627:
1624:
1623:
1621:
1617:
1611:
1608:
1606:
1603:
1601:
1598:
1596:
1593:
1591:
1588:
1586:
1583:
1581:
1578:
1576:
1573:
1571:
1568:
1567:
1565:
1563:
1559:
1553:
1550:
1548:
1545:
1543:
1540:
1538:
1535:
1533:
1530:
1528:
1525:
1523:
1520:
1518:
1517:Edit distance
1515:
1513:
1510:
1508:
1505:
1503:
1500:
1499:
1497:
1495:
1494:String metric
1491:
1487:
1480:
1475:
1473:
1468:
1466:
1461:
1460:
1457:
1451:
1448:
1446:
1443:
1441:
1438:
1436:
1433:
1431:
1428:
1426:
1423:
1422:
1410:
1408:0-521-58519-8
1404:
1400:
1399:
1391:
1381:
1376:
1372:
1365:
1361:
1351:
1347:
1345:
1341:
1339:
1336:
1335:
1326:
1321:
1320:
1303:
1300:
1294:
1268:
1265:
1262:
1250:
1242:
1241:
1240:
1237:
1196:
1165:
1162:
1156:
1145:
1141:
1137:
1134:
1130:
1126:
1122:
1118:
1114:
1110:
1106:
1102:
1098:
1094:
1086:
1069:
1055:
1036:
1033:
1003:
988:
985:
977:
973:
964:
962:
941:
937:
933:
930:
927:
922:
918:
903:
882:
878:
874:
869:
865:
833:
829:
825:
822:
819:
814:
810:
780:
777:
763:
746:
739:
736:
733:
727:
724:
720:
716:
713:
710:
704:
701:
698:
691:
687:
666:
657:
654:
651:
645:
642:
636:
632:
629:
626:
620:
617:
614:
607:
603:
580:
572:
569:
566:
560:
557:
552:
548:
544:
523:
515:
511:
492:
489:
486:
476:
442:
422:
408:
394:
374:
371:
368:
365:
362:
342:
339:
334:
330:
326:
304:
300:
296:
286:
282:
251:
247:
243:
240:
237:
232:
228:
221:
218:
210:
206:
201:
187:
167:
147:
127:
107:
87:
70:
67:
60:
51:
49:
45:
41:
37:
33:
29:
22:
1785:
1720:Suffix array
1626:AhoâCorasick
1537:Lee distance
1397:
1390:
1370:
1364:
1349:
1343:
1238:
1234:S-z)..(ret)]
1194:
1148:
1143:
1139:
1135:
1132:
1128:
1124:
1120:
1116:
1112:
1108:
1104:
1100:
1096:
1092:
989:
981:
959:time with a
414:
208:
204:
202:
79:
68:
65:
31:
25:
1730:Suffix tree
1287:instead of
967:Suffix tree
680:time using
1821:Categories
1357:References
1216:. The set
900:time with
851:space and
411:Algorithms
140:of length
100:of length
38:that is a
1064:Θ
1028:Θ
998:Θ
931:⋯
912:Θ
904:and take
875:⋯
859:Θ
823:⋯
804:Θ
772:Θ
728:
717:σ
714:
646:
633:σ
630:
561:
524:σ
464:Θ
407:strings.
372:≤
366:≤
327:∑
241:…
40:substring
1332:See also
1097:function
514:word RAM
269:, where
54:Examples
1796:Sorting
1766:Parsing
1486:Strings
1405:
1193:which
1144:return
1119:i = 1
1085:time.
764:costs
36:string
1759:Other
1715:DAFSA
1682:BLAST
1346:-gram
1101:array
1750:Trie
1740:Rope
1403:ISBN
1200:and
1189:and
1146:ret
1140:else
1133:else
1125:else
435:and
319:and
180:and
120:and
46:and
30:, a
1375:doi
1257:min
1230:ret
1218:ret
1210:ret
1109:for
1105:for
725:log
711:log
643:log
627:log
558:log
455:in
26:In
1823::
1319:)
1236:.
1232:,
1136:if
1129:if
1121:or
1117:if
1113:if
963:.
200:.
50:.
1478:e
1471:t
1464:v
1411:.
1383:.
1377::
1350:n
1344:n
1307:)
1304:r
1301:n
1298:(
1295:O
1275:)
1272:)
1269:n
1266:,
1263:r
1260:(
1254:(
1251:O
1226:S
1222:i
1214:z
1206:z
1202:j
1198:i
1191:T
1187:S
1183:L
1169:)
1166:r
1163:n
1160:(
1157:O
1073:)
1070:N
1067:(
1040:)
1037:K
1034:N
1031:(
1007:)
1004:N
1001:(
947:)
942:K
938:n
934:+
928:+
923:1
919:n
915:(
888:)
883:K
879:n
870:1
866:n
862:(
839:)
834:K
830:n
826:+
820:+
815:1
811:n
807:(
784:)
781:m
778:n
775:(
747:)
743:)
740:m
737:+
734:n
731:(
721:/
708:)
705:m
702:+
699:n
696:(
692:(
688:O
667:)
661:)
658:m
655:+
652:n
649:(
637:/
624:)
621:m
618:+
615:n
612:(
608:(
604:O
581:)
576:)
573:m
570:+
567:n
564:(
553:(
549:o
545:2
496:)
493:m
490:+
487:n
484:(
443:T
423:S
395:k
375:K
369:k
363:2
343:N
340:=
335:i
331:n
305:i
301:n
297:=
293:|
287:i
283:S
278:|
257:}
252:K
248:S
244:,
238:,
233:1
229:S
225:{
222:=
219:S
207:-
205:k
188:T
168:S
148:n
128:T
108:m
88:S
23:.
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.