HH-suite - Knowledge

522:(MSAs) starting from a single query sequence or a MSA. As in PSI-BLAST, it works iteratively, repeatedly constructing new query profiles by adding the results found in the previous round. It matches against a pre-built HMM databases derived from protein sequence databases, each representing a "cluster" of related proteins. In the case of HHblits, such matches are done on the level of HMM-HMM profiles, which grants additional sensitivity. Its prefiltering reduces the tens of millions HMMs to match against to a few thousands of them, thus speeding up the slow HMM-HMM comparison process. 430:

functions remain unknown. Many proteins have been investigated in model organisms such as many bacteria, baker's yeast, fruit flies, zebra fish or mice, for which experiments can be often done more easily than with human cells. To predict the function, structure, or other properties of a protein for which only its sequence of amino acids is known, the protein sequence is compared to the sequences of other proteins in public databases. If a protein with sufficiently similar sequence is found, the two proteins are likely to be evolutionarily related (

298: 499:(MSAs), in which related proteins are written together (aligned), such that the frequencies of amino acids in each position can be interpreted as probabilities for amino acids in new related proteins, and be used to derive the "similarity scores". Because profiles contain much more information than a single sequence (e.g. the position-specific degree of conservation), profile-profile comparison methods are much more powerful than sequence-sequence comparison methods like 63: 479: 120: 22: 434:). In that case, they are likely to share similar structures and functions. Therefore, if a protein with a sufficiently similar sequence and with known functions and/or structure can be found by the sequence search, the unknown protein's functions, structure, and domain composition can be predicted. Such predictions greatly facilitate the determination of the function or structure by targeted validation experiments. 514:

of sequences related to the query sequence/MSA using the HHblits program. From this alignment, a profile HMM is calculated. The databases contain HMMs that are precalculated in the same fashion using PSI-BLAST. The output of HHpred and HHsearch is a ranked list of database matches (including E-values

577:

7, 8, and 9, for blind protein structure prediction experiments. In CASP9, HHpredA, B, and C were ranked 1st, 2nd, and 3rd out of 81 participating automatic structure prediction servers in template-based modeling and 6th, 7th, 8th on all 147 targets, while being much faster than the best 20 servers.

565:

of the query with the template protein sequence. For example, a search through the PDB database of proteins with solved 3D structure takes a few minutes. If a significant match with a protein of known structure (a "template") is found in the PDB database, HHpred allows the user to build a homology

429:

Proteins are central players in all of life's processes. Understanding them is central to understanding molecular processes in cells. This is particularly important in order to understand the origin of diseases. But for a large fraction of the approximately 20 000 human proteins the structures and

437:

Sequence searches are frequently performed by biologists to infer the function of an unknown protein from its sequence. For this purpose, the protein's sequence is compared to the sequences of other proteins in public databases and its function is deduced from those of the most similar sequences.

590:

In addition to HHsearch and HHblits, the HH-suite contains programs and perl scripts for format conversion, filtering of MSAs, generation of profile HMMs, the addition of secondary structure predictions to MSAs, the extraction of alignments from program output, and the generation of customized

393:

sequence searching. It contains programs that can search for similar protein sequences in protein sequence databases. Sequence searches are a standard tool in modern biology with which the function of unknown proteins can be inferred from the functions of proteins with similar sequences.

510:(HMMs), an extension of PSSM sequence profiles that also records position-specific amino acid insertion and deletion frequencies. HHsearch searches a database of HMMs with a query HMM. Before starting the search through the actual database of HMMs, HHsearch/HHpred builds a 486:

Modern sensitive methods for protein search utilize sequence profiles. They may be used to compare a sequence to a profile, or in more advanced cases such as HH-suite, to match among profiles. Profiles and alignments are themselves derived from matches, using for example

549:

Applications of HHpred and HHsearch include protein structure prediction, complex structure prediction, function prediction, domain prediction, domain boundary prediction, and evolutionary classification of proteins.

417:(HMMs). The name comes from the fact that it performs HMM-HMM alignments. Among the most popular methods for protein sequence matching, the programs have been cited more than 5000 times total according to 450:

can be inferred. HHsearch performs searches with a protein sequence through databases. The HHpred server and the HH-suite software package offer many popular, regularly updated databases, such as the

561:

is searched for "template" proteins similar to the query protein. If such a template protein is found, the structure of the protein of interest can be predicted based on a pairwise

463: 557:, that is, to build a model of the structure of a query protein for which only the sequence is known: For that purpose, a database of proteins with known structures such as the 438:

Often, no sequences with annotated functions can be found in such a search. In this case, more sensitive methods are required to identify more remotely related proteins or

582:

8, HHpred was ranked 7th on all targets and 2nd on the subset of single domain proteins, while still being more than 50 times faster than the top-ranked servers.

141: 134: 525:

The HH-suite comes with a number of pre-built profile HMMs that can be searched using HHblits and HHsearch, among them a clustered version of the

538: 1277: 495:(PSSM) profile contains for each position in the query sequence the similarity score for the 20 amino acids. The profiles are derived from 467: 184: 402:

are two main programs in the package and the entry point to its search function, the latter being a faster iteration.

221: 203: 156: 101: 49: 1260: 1307: 1097: 721: 492: 345: 79: 163: 1312: 72: 830: 716: 443: 407: 277: 170: 726: 711: 562: 519: 511: 496: 431: 254: 1171: 363: 152: 35: 245:

Johannes Söding, Michael Remmert, Andreas Biegert, Andreas Hauser, Markus Meier, Martin Steinegger

948: 944: 130: 940: 515:

and probabilities for a true relationship) and the pairwise query-database sequence alignments.

1182: 1014:"Profile–profile comparisons by COMPASS predict intricate homologies between protein families" 736: 500: 488: 698:

The HMM-HMM alignment algorithm of HHblits and HHsearch was significantly accelerated using

386: 858: 803: 8: 507: 414: 297: 1222: 1195: 1149: 1124: 1038: 1013: 989: 964: 918: 894:"The HHpred interactive server for protein homology detection and structure prediction" 893: 871: 760: 177: 1282: 1227: 1154: 1078: 1043: 994: 923: 875: 863: 808: 558: 554: 530: 451: 1287: 793: 776: 1217: 1207: 1144: 1136: 1070: 1033: 1025: 984: 976: 913: 905: 853: 845: 831:"HHblits: Lightning-fast iterative protein sequence searching by HMM-HMM alignment" 798: 788: 358: 338: 318: 1254: 78:

It may require cleanup to comply with Knowledge's content policies, particularly

1194:

Steinegger M, Meier M, Mirdita M, Vöhringer H, Haunsberger S, Söding J (2019).

1061:

Dunbrack RL Jr (2006). "Sequence comparison and protein structure prediction".

447: 439: 418: 350: 1212: 1074: 1301: 1292: 1172:

Official CASP9 results for the template-based modeling category (121 targets)

83: 41: 1231: 1158: 1082: 1047: 998: 927: 867: 812: 1266: 1196:"HH-suite3 for fast remote homology detection and deep protein annotation" 692:

Generate PDB file with indices renumbered to match input sequence indices

668:

Build HHblits database with prefiltering, packed MSA/HMM, and index files

909: 732:

CASP - Critical Assessment of Techniques for Protein Structure Prediction

684:

Split a multiple-sequence FASTA file into multiple single-sequence files

624:

Filter an MSA by maximum sequence identity, coverage, and other criteria

373: 286: 1101: 1029: 980: 849: 240: 1140: 600:(Iteratively) search an HHblits database with a query sequence or MSA 442:. From these relationships, hypotheses about the protein's functions, 1125:"Mapping Monomeric Threading to Protein–Protein Structure Prediction" 323: 478: 119: 1248: 741: 660:

Generate MSAs or coarse 3D models from HHsearch or HHblits results

567: 455: 649: 526: 390: 676:

Run a command for many files in parallel using multiple threads

632:

Calculate pairwise alignments, dot plots etc. for two HMMs/MSAs

518:

HHblits, a part of the HH-suite since 2001, builds high-quality

1272: 1193: 570:

software, starting from the pairwise query-template alignment.

327: 573:

HHpred servers have been ranked among the best servers during

506:

HHpred and HHsearch represent query and database proteins by

311: 608:

Search an HHsearch database of HMMs with a query MSA or HMM

962: 731: 699: 579: 574: 534: 459: 1251:

at Max-Planck Institute in Göttingen - HH-suite developers

828: 1269:— free server at Max-Planck Institute in Tuebingen 1263:— free server at Max-Planck Institute in Tuebingen 1122: 503:

or profile-sequence comparison methods like PSI-BLAST.

71:

A major contributor to this article appears to have a

1011: 652:

predicted secondary structure to an MSA or HHM file

965:"Improving the quality of twilight-zone alignments" 891: 777:"Protein homology detection by HMM-HMM comparison" 829:Remmert M, Biegert A, Hauser A, Söding J (2011). 1299: 1060: 963:Jaroszewski L, Rychlewski L, Godzik A (2000). 958: 956: 410:that uses homology information from HH-suite. 1187: 1129:Journal of Chemical Information and Modeling 1255:Precompiled HH-suite binaries and databases 1123:Guerler A, Govindarajoo B, Zhang Y (2013). 953: 541:structural protein domains, and many more. 482:Iterative sequence search scheme of HHblits 50:Learn how and when to remove these messages 1183:Official CASP9 results for all 147 targets 770: 768: 413:The HH-suite searches for sequences using 296: 1221: 1211: 1148: 1037: 1012:Sadreyev RI, Baker D, Grishin NV (2003). 988: 917: 857: 824: 822: 802: 792: 737:BLAST (Basic Local Alignment Search Tool) 222:Learn how and when to remove this message 204:Learn how and when to remove this message 102:Learn how and when to remove this message 887: 885: 774: 477: 765: 1300: 892:Söding J, Biegert A, Lupas AN (2005). 819: 533:of proteins with known structures, of 374:https://github.com/soedinglab/hh-suite 260:3.3.0 / 25 August 2020 140:Please improve this article by adding 1278:CASP9 template-based modeling results 1063:Current Opinion in Structural Biology 882: 113: 56: 15: 1293:HH-suite arch linux user repository 13: 14: 1324: 1242: 742:Context-specific BLAST (CS-BLAST) 31:This article has multiple issues. 722:Position-specific scoring matrix 493:position-specific scoring matrix 118: 82:. Please discuss further on the 61: 20: 1176: 1165: 1116: 616:Build an HMM from an input MSA 544: 39:or discuss these issues on the 1095: 1089: 1054: 1005: 934: 904:(Web Server issue): W244–248. 859:11858/00-001M-0000-0015-8D56-A 804:11858/00-001M-0000-0017-EC7A-F 754: 702:in version 3 of the HH-suite. 537:protein family alignments, of 1: 794:10.1093/bioinformatics/bti125 747: 424: 142:secondary or tertiary sources 717:Protein structure prediction 520:multiple sequence alignments 508:profile hidden Markov models 497:multiple sequence alignments 473: 408:protein structure prediction 7: 727:Multiple sequence alignment 712:Sequence alignment software 705: 585: 553:HHsearch is often used for 512:multiple sequence alignment 10: 1329: 1098:"Some Notes about HHSuite" 640:Reformat one or many MSAs 1213:10.1186/s12859-019-3019-7 1075:10.1016/j.sbi.2006.05.006 369: 357: 344: 334: 317: 307: 276: 272: 253: 249: 239: 1257:download from developers 406:is an online server for 1308:Bioinformatics software 1288:HH-suite ubuntu package 1283:HH-suite debian package 898:Nucleic Acids Research 761:Debian hhsuite package 483: 389:package for sensitive 129:relies excessively on 1313:Computational science 481: 80:neutral point of view 415:hidden Markov models 387:open-source software 1030:10.1110/ps.03197403 981:10.1110/ps.9.8.1487 941:Citations to HHpred 700:vector instructions 236: 1200:BMC Bioinformatics 910:10.1093/nar/gki408 850:10.1038/NMETH.1818 563:sequence alignment 484: 448:domain composition 234: 1141:10.1021/ci300579r 1024:(10): 2262–2272. 775:Söding J (2005). 696: 695: 559:protein data bank 555:homology modeling 531:Protein Data Bank 529:database, of the 454:, as well as the 452:Protein Data Bank 379: 378: 330:package available 232: 231: 224: 214: 213: 206: 188: 112: 111: 104: 75:with its subject. 54: 1320: 1236: 1235: 1225: 1215: 1191: 1185: 1180: 1174: 1169: 1163: 1162: 1152: 1120: 1114: 1113: 1111: 1109: 1100:. Archived from 1093: 1087: 1086: 1058: 1052: 1051: 1041: 1009: 1003: 1002: 992: 975:(8): 1487–1496. 960: 951: 938: 932: 931: 921: 889: 880: 879: 861: 835: 826: 817: 816: 806: 796: 772: 763: 758: 594: 593: 566:model using the 440:protein families 319:Operating system 300: 295: 292: 290: 288: 267: 265: 237: 233: 227: 220: 209: 202: 198: 195: 189: 187: 146: 122: 114: 107: 100: 96: 93: 87: 73:close connection 65: 64: 57: 46: 24: 23: 16: 1328: 1327: 1323: 1322: 1321: 1319: 1318: 1317: 1298: 1297: 1245: 1240: 1239: 1192: 1188: 1181: 1177: 1170: 1166: 1121: 1117: 1107: 1105: 1104:on 3 April 2019 1094: 1090: 1059: 1055: 1018:Protein Science 1010: 1006: 969:Protein Science 961: 954: 939: 935: 890: 883: 833: 827: 820: 773: 766: 759: 755: 750: 708: 689:renumberpdb.pl 673:multithread.pl 657:hhmakemodel.pl 588: 547: 476: 427: 303: 285: 268: 263: 261: 228: 217: 216: 215: 210: 199: 193: 190: 147: 145: 139: 135:primary sources 123: 108: 97: 91: 88: 77: 66: 62: 25: 21: 12: 11: 5: 1326: 1316: 1315: 1310: 1296: 1295: 1290: 1285: 1280: 1275: 1270: 1264: 1258: 1252: 1244: 1243:External links 1241: 1238: 1237: 1186: 1175: 1164: 1115: 1088: 1069:(3): 374–384. 1053: 1004: 952: 933: 881: 844:(2): 173–175. 818: 787:(7): 951–960. 781:Bioinformatics 764: 752: 751: 749: 746: 745: 744: 739: 734: 729: 724: 719: 714: 707: 704: 694: 693: 690: 686: 685: 682: 681:splitfasta.pl 678: 677: 674: 670: 669: 666: 662: 661: 658: 654: 653: 646: 642: 641: 638: 634: 633: 630: 626: 625: 622: 618: 617: 614: 610: 609: 606: 602: 601: 598: 587: 584: 546: 543: 491:or HHblits. A 475: 472: 426: 423: 419:Google Scholar 377: 376: 371: 367: 366: 361: 355: 354: 351:Bioinformatics 348: 342: 341: 336: 332: 331: 321: 315: 314: 309: 305: 304: 302: 301: 282: 280: 274: 273: 270: 269: 259: 257: 255:Stable release 251: 250: 247: 246: 243: 230: 229: 212: 211: 126: 124: 117: 110: 109: 69: 67: 60: 55: 29: 28: 26: 19: 9: 6: 4: 3: 2: 1325: 1314: 1311: 1309: 1306: 1305: 1303: 1294: 1291: 1289: 1286: 1284: 1281: 1279: 1276: 1274: 1271: 1268: 1265: 1262: 1259: 1256: 1253: 1250: 1247: 1246: 1233: 1229: 1224: 1219: 1214: 1209: 1205: 1201: 1197: 1190: 1184: 1179: 1173: 1168: 1160: 1156: 1151: 1146: 1142: 1138: 1135:(3): 717–25. 1134: 1130: 1126: 1119: 1103: 1099: 1092: 1084: 1080: 1076: 1072: 1068: 1064: 1057: 1049: 1045: 1040: 1035: 1031: 1027: 1023: 1019: 1015: 1008: 1000: 996: 991: 986: 982: 978: 974: 970: 966: 959: 957: 950: 946: 942: 937: 929: 925: 920: 915: 911: 907: 903: 899: 895: 888: 886: 877: 873: 869: 865: 860: 855: 851: 847: 843: 839: 832: 825: 823: 814: 810: 805: 800: 795: 790: 786: 782: 778: 771: 769: 762: 757: 753: 743: 740: 738: 735: 733: 730: 728: 725: 723: 720: 718: 715: 713: 710: 709: 703: 701: 691: 688: 687: 683: 680: 679: 675: 672: 671: 667: 665:hhblitsdb.pl 664: 663: 659: 656: 655: 651: 647: 644: 643: 639: 636: 635: 631: 628: 627: 623: 620: 619: 615: 612: 611: 607: 604: 603: 599: 596: 595: 592: 583: 581: 576: 571: 569: 564: 560: 556: 551: 542: 540: 536: 532: 528: 523: 521: 516: 513: 509: 504: 502: 498: 494: 490: 480: 471: 469: 465: 461: 457: 453: 449: 445: 441: 435: 433: 422: 420: 416: 411: 409: 405: 401: 397: 392: 388: 384: 375: 372: 368: 365: 362: 360: 356: 352: 349: 347: 343: 340: 337: 333: 329: 325: 322: 320: 316: 313: 310: 306: 299: 294: 284: 283: 281: 279: 275: 271: 258: 256: 252: 248: 244: 242: 238: 226: 223: 208: 205: 197: 186: 183: 179: 176: 172: 169: 165: 162: 158: 155: – 154: 150: 149:Find sources: 143: 137: 136: 132: 127:This article 125: 121: 116: 115: 106: 103: 95: 85: 81: 76: 74: 68: 59: 58: 53: 51: 44: 43: 38: 37: 32: 27: 18: 17: 1273:CASP website 1203: 1199: 1189: 1178: 1167: 1132: 1128: 1118: 1106:. Retrieved 1102:the original 1096:Li, Zhaoyu. 1091: 1066: 1062: 1056: 1021: 1017: 1007: 972: 968: 936: 901: 897: 841: 838:Nat. Methods 837: 784: 780: 756: 697: 637:reformat.pl 589: 572: 552: 548: 545:Applications 524: 517: 505: 485: 436: 432:"homologous" 428: 412: 403: 399: 395: 382: 380: 335:Available in 241:Developer(s) 218: 200: 191: 181: 174: 167: 160: 148: 128: 98: 89: 70: 47: 40: 34: 33:Please help 30: 1249:Soeding Lab 945:to HHsearch 591:databases. 470:databases. 291:/soedinglab 92:August 2018 1302:Categories 1206:(1): 473. 949:to HHblits 748:References 425:Background 308:Written in 278:Repository 264:2020-08-25 164:newspapers 153:"HH-suite" 131:references 36:improve it 876:205420247 645:addss.pl 621:hhfilter 605:hhsearch 489:PSI-BLAST 474:Algorithm 444:structure 324:Unix-like 293:/hh-suite 194:July 2012 84:talk page 42:talk page 1232:31521110 1159:23413988 1083:16713709 1048:14500884 999:10975570 928:15980461 868:22198341 813:15531603 706:See also 629:hhalign 597:hhblits 586:Contents 568:MODELLER 456:InterPro 396:HHsearch 383:HH-suite 235:HH-suite 1267:HHblits 1223:6744700 1150:4076494 1108:3 April 1039:2366929 990:2144727 919:1160169 650:Psipred 613:hhmake 527:UniProt 400:HHblits 391:protein 370:Website 359:License 339:English 262: ( 178:scholar 1261:HHpred 1230: 1220: 1157: 1147: 1081: 1046: 1036: 997: 987: 926: 916: 874: 866: 811: 466:, and 446:, and 404:HHpred 385:is an 364:GPL v3 328:Debian 287:github 180: 173: 166: 159: 151: 872:S2CID 834:(PDF) 501:BLAST 185:JSTOR 171:books 1228:PMID 1155:PMID 1110:2019 1079:PMID 1044:PMID 995:PMID 924:PMID 864:PMID 809:PMID 648:Add 580:CASP 575:CASP 539:SCOP 535:Pfam 468:SCOP 460:Pfam 398:and 381:The 353:tool 346:Type 289:.com 157:news 1218:PMC 1208:doi 1145:PMC 1137:doi 1071:doi 1034:PMC 1026:doi 985:PMC 977:doi 914:PMC 906:doi 854:hdl 846:doi 799:hdl 789:doi 578:In 464:COG 312:C++ 133:to 1304:: 1226:. 1216:. 1204:20 1202:. 1198:. 1153:. 1143:. 1133:53 1131:. 1127:. 1077:. 1067:16 1065:. 1042:. 1032:. 1022:12 1020:. 1016:. 993:. 983:. 971:. 967:. 955:^ 947:, 943:, 922:. 912:. 902:33 900:. 896:. 884:^ 870:. 862:. 852:. 840:. 836:. 821:^ 807:. 797:. 785:21 783:. 779:. 767:^ 462:, 458:, 421:. 326:; 144:. 45:. 1234:. 1210:: 1161:. 1139:: 1112:. 1085:. 1073:: 1050:. 1028:: 1001:. 979:: 973:9 930:. 908:: 878:. 856:: 848:: 842:9 815:. 801:: 791:: 266:) 225:) 219:( 207:) 201:( 196:) 192:( 182:· 175:· 168:· 161:· 138:. 105:) 99:( 94:) 90:( 86:. 52:) 48:(

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

Index