Knowledge

Statistical coupling analysis

Source 📝

745:
family, they were able to identify a small network of residues that were energetically coupled to a binding site residue. The network consisted of both residues spatially close to the binding site in the tertiary fold, called contact pairs, and more distant residues that participate in longer-range
781:
to natural WW domains. The fact that 12 out of the 43 designed proteins with the same SCA profile as natural WW domains properly folded provided strong evidence that little information—only coupling information—was required for specifying the protein fold. This support for the SCA hypothesis was
700: 704:
Statistical coupling energy is often systematically calculated between a fixed, perturbated position, and all other positions in an MSA. Continuing with the example MSA from the beginning of the section, consider a perturbation at position
561: 59:
Statistical coupling energy measures how a perturbation of amino acid distribution at one site in an MSA affects the amino acid distribution at another site. For example, consider a multiple sequence alignment with sites (or columns)
392: 786:
to natural WW folds, and b) none of the artificial proteins designed without coupling information folded properly. An accompanying study showed that the artificial WW domains were functionally similar to natural WW domains in
220: 825: 801:, it has been shown that, when combined with a simple residue-residue distance metric, SCA-based scoring can fairly accurately distinguish native from non-native protein folds. 566: 442: 941:
Suel; Lockless, SW; Wall, MA; Ranganathan, R; et al. (2003). "Evolutionarily conserved networks of residues mediate allosteric communication in proteins".
246: 984:
Socolich; Lockless, SW; Russ, WP; Lee, H; Gardner, KH; Ranganathan, R; et al. (2005). "Evolutionary information for specifying a protein fold".
130: 1087: 709:
where the amino distribution changes from 40% I, 40% H, 20% M to 100% I. If, in a subsequent subalignment, this changes the distribution at
1088:"Using scores derived from statistical coupling analysis to distinguish correct and incorrect folds in de-novo protein structure prediction" 769:
Statistical coupling analysis has also been used as a basis for computational protein design. In 2005, Socolich et al. used an SCA for the
741:
Ranganathan and Lockless originally developed SCA to examine thermodynamic (energetic) coupling of residue pairs in proteins. Using the
51:
indicates the degree of evolutionary dependence between the residues, with higher coupling energy corresponding to increased dependence.
795: 116:
have an amino acid distribution different from the mean distribution observed in all proteins, they are said to have some degree of
1035:
Russ; Lowery, DM; Mishra, P; Yaffe, MB; Ranganathan, R; et al. (2005). "Natural-like function in artificial WW domains".
788: 867:"A perturbation-based method for calculating explicit likelihood of evolutionary co-variance in multiple sequence alignments" 852:"Supplementary Material for 'Evolutionarily conserved networks of residues mediate allosteric communication in proteins.'" 906:
Lockless SW, Ranaganathan R (1999). "Evolutionarily conserved pathways of energetic connectivity in protein families".
100:
has an average distribution (the 20 amino acids are present at roughly the same frequencies seen in all proteins), and
774: 1123: 695:{\displaystyle \Delta \Delta G_{i,j}^{stat}={\sqrt {\sum _{x}(\ln P_{i|\delta j}^{x}-\ln P_{i}^{x})^{2}}}} 36: 417:
in all positions among all sequenced proteins. The summation runs over all 20 amino acids. After ΔG
830: 238: 1143: 782:
made more compelling considering that a) the successfully folded proteins had only 36% average
39:(MSA). More specifically, it quantifies how much the amino acid distribution at some position 833:- A summary of the Ranganathan lab's SCA-based design of artificial yet functional WW domains. 820: 556:{\displaystyle \Delta \Delta G_{i,j}^{stat}=\Delta G_{i|\delta j}^{stat}-\Delta G_{i}^{stat}} 1044: 993: 8: 762:
families also showed energetic coupling in sparse networks of residues that cooperate in
1048: 997: 387:{\displaystyle P_{i}^{x}={\frac {N!}{n_{x}!(N-n_{x})!}}p_{x}^{n_{x}}(1-p_{x})^{N-n_{x}}} 1115: 1068: 1017: 966: 809: 783: 1107: 1060: 1009: 958: 923: 888: 778: 763: 1119: 970: 883: 866: 713:
from 60% V, 40% L to 90% V, 10% L, but does not change the distribution at position
1099: 1072: 1052: 1021: 1001: 950: 915: 878: 919: 755: 43:
changes upon a perturbation of the amino acid distribution at another position
24: 425:
in a subalignment produced after a perturbation of amino acid distribution at
1137: 851: 1111: 1064: 1013: 962: 927: 892: 215:{\displaystyle \Delta G_{i}^{stat}={\sqrt {\sum _{x}(\ln P_{i}^{x})^{2}}}} 717:, then there would be some amount of statistical coupling energy between 32: 28: 1056: 1005: 1103: 759: 742: 93: 85: 123:
In statistical coupling analysis, the conservation (ΔG) at each site (
826:
Ranganathan lecture on statistical coupling analysis (audio included)
770: 89: 68:, where each site has some distribution of amino acids. At position 954: 77: 437:, is simply the difference between these two values. That is: 73: 865:
Dekker; Fodor, A; Aldrich, RW; Yellen, G; et al. (2004).
747: 54: 940: 905: 751: 746:
energetic interactions. Later applications of SCA by the
413:
corresponds to the approximate distribution of amino acid
864: 983: 1085: 1034: 569: 445: 433:) is taken. Statistical coupling energy, denoted ΔΔG 249: 133: 694: 555: 386: 214: 1135: 229:describes the probability of finding amino acid 104:has 80% histidine, 20% valine. Since positions 401:is the percentage of sequences with residue 773:to create artificial proteins with similar 421:is computed, the conservation for position 76:and the remaining 40% of sequences have a 882: 55:Definition of statistical coupling energy 789:ligand binding affinity and specificity 1136: 13: 573: 570: 526: 485: 449: 446: 237:, and is defined by a function in 134: 14: 1155: 814: 831:Protein folding — a step closer? 1086:Bartlett GJ, Taylor WR (2008). 736: 1079: 1028: 977: 934: 899: 858: 844: 681: 639: 621: 497: 405:(e.g. methionine) at position 362: 342: 311: 292: 201: 176: 72:, 60% of the sequences have a 1: 884:10.1093/bioinformatics/bth128 837: 439: 243: 17:Statistical coupling analysis 920:10.1126/science.286.5438.295 799:protein structure prediction 7: 804: 49:statistical coupling energy 37:multiple sequence alignment 10: 1160: 943:Nature Structural Biology 764:allosteric communication 84:the distribution is 40% 775:thermodynamic stability 23:is a technique used in 696: 557: 388: 216: 697: 563:, or, more commonly, 558: 389: 217: 821:What is a WW domain? 567: 443: 247: 131: 1057:10.1038/nature03990 1049:2005Natur.437..579R 1006:10.1038/nature03991 998:2005Natur.437..512S 679: 655: 605: 552: 522: 481: 341: 264: 199: 160: 1104:10.1002/prot.21779 810:Mutual information 692: 665: 630: 620: 576: 553: 529: 488: 452: 384: 320: 250: 212: 185: 175: 137: 1043:(7058): 579–583. 992:(7058): 512–518. 914:(5438): 295–299. 877:(10): 1565–1572. 784:sequence identity 748:Ranganathan group 725:but none between 690: 611: 397:where N is 100, n 318: 210: 166: 127:) is defined as: 47:. The resulting 31:between pairs of 1151: 1128: 1127: 1122:. Archived from 1083: 1077: 1076: 1032: 1026: 1025: 981: 975: 974: 938: 932: 931: 903: 897: 896: 886: 862: 856: 855: 848: 701: 699: 698: 693: 691: 689: 688: 678: 673: 654: 649: 642: 619: 610: 604: 590: 562: 560: 559: 554: 551: 537: 521: 507: 500: 480: 466: 393: 391: 390: 385: 383: 382: 381: 380: 360: 359: 340: 339: 338: 328: 319: 317: 310: 309: 288: 287: 277: 269: 263: 258: 221: 219: 218: 213: 211: 209: 208: 198: 193: 174: 165: 159: 145: 1159: 1158: 1154: 1153: 1152: 1150: 1149: 1148: 1134: 1133: 1132: 1131: 1084: 1080: 1033: 1029: 982: 978: 939: 935: 904: 900: 863: 859: 850: 849: 845: 840: 817: 807: 756:serine protease 739: 702: 684: 680: 674: 669: 650: 638: 634: 615: 609: 591: 580: 568: 565: 564: 538: 533: 508: 496: 492: 467: 456: 444: 441: 440: 436: 432: 420: 412: 400: 395: 376: 372: 365: 361: 355: 351: 334: 330: 329: 324: 305: 301: 283: 279: 278: 270: 268: 259: 254: 248: 245: 244: 228: 204: 200: 194: 189: 170: 164: 146: 141: 132: 129: 128: 57: 12: 11: 5: 1157: 1147: 1146: 1144:Bioinformatics 1130: 1129: 1126:on 2012-12-17. 1098:(1): 950–959. 1078: 1027: 976: 955:10.1038/nsb881 933: 898: 871:Bioinformatics 857: 842: 841: 839: 836: 835: 834: 828: 823: 816: 815:External links 813: 806: 803: 738: 735: 687: 683: 677: 672: 668: 664: 661: 658: 653: 648: 645: 641: 637: 633: 629: 626: 623: 618: 614: 608: 603: 600: 597: 594: 589: 586: 583: 579: 575: 572: 550: 547: 544: 541: 536: 532: 528: 525: 520: 517: 514: 511: 506: 503: 499: 495: 491: 487: 484: 479: 476: 473: 470: 465: 462: 459: 455: 451: 448: 434: 430: 418: 410: 398: 379: 375: 371: 368: 364: 358: 354: 350: 347: 344: 337: 333: 327: 323: 316: 313: 308: 304: 300: 297: 294: 291: 286: 282: 276: 273: 267: 262: 257: 253: 226: 207: 203: 197: 192: 188: 184: 181: 178: 173: 169: 163: 158: 155: 152: 149: 144: 140: 136: 80:, at position 56: 53: 25:bioinformatics 9: 6: 4: 3: 2: 1156: 1145: 1142: 1141: 1139: 1125: 1121: 1117: 1113: 1109: 1105: 1101: 1097: 1093: 1089: 1082: 1074: 1070: 1066: 1062: 1058: 1054: 1050: 1046: 1042: 1038: 1031: 1023: 1019: 1015: 1011: 1007: 1003: 999: 995: 991: 987: 980: 972: 968: 964: 960: 956: 952: 948: 944: 937: 929: 925: 921: 917: 913: 909: 902: 894: 890: 885: 880: 876: 872: 868: 861: 853: 847: 843: 832: 829: 827: 824: 822: 819: 818: 812: 811: 802: 800: 798: 792: 790: 785: 780: 776: 772: 767: 765: 761: 757: 753: 749: 744: 734: 732: 728: 724: 720: 716: 712: 708: 685: 675: 670: 666: 662: 659: 656: 651: 646: 643: 635: 631: 627: 624: 616: 612: 606: 601: 598: 595: 592: 587: 584: 581: 577: 548: 545: 542: 539: 534: 530: 523: 518: 515: 512: 509: 504: 501: 493: 489: 482: 477: 474: 471: 468: 463: 460: 457: 453: 438: 428: 424: 416: 408: 404: 377: 373: 369: 366: 356: 352: 348: 345: 335: 331: 325: 321: 314: 306: 302: 298: 295: 289: 284: 280: 274: 271: 265: 260: 255: 251: 242: 241:as follows: 240: 239:binomial form 236: 232: 223: 205: 195: 190: 186: 182: 179: 171: 167: 161: 156: 153: 150: 147: 142: 138: 126: 121: 119: 115: 111: 107: 103: 99: 95: 91: 87: 83: 79: 75: 71: 67: 63: 52: 50: 46: 42: 38: 35:in a protein 34: 30: 26: 22: 18: 1124:the original 1095: 1091: 1081: 1040: 1036: 1030: 989: 985: 979: 949:(1): 59–69. 946: 942: 936: 911: 907: 901: 874: 870: 860: 846: 808: 796: 793: 768: 740: 737:Applications 730: 726: 722: 718: 714: 710: 706: 703: 426: 422: 414: 406: 402: 396: 234: 233:at position 230: 224: 124: 122: 118:conservation 117: 113: 109: 105: 101: 97: 81: 69: 65: 61: 58: 48: 44: 40: 20: 16: 15: 33:amino acids 29:covariation 27:to measure 838:References 760:hemoglobin 743:PDZ domain 94:methionine 86:isoleucine 779:structure 771:WW domain 663:⁡ 657:− 644:δ 628:⁡ 613:∑ 574:Δ 571:Δ 527:Δ 524:− 502:δ 486:Δ 450:Δ 447:Δ 370:− 349:− 299:− 183:⁡ 168:∑ 135:Δ 90:histidine 1138:Category 1120:33836866 1112:18004776 1092:Proteins 1065:16177795 1014:16177782 971:67749580 963:12483203 928:10514373 893:14962924 805:See also 92:and 20% 64:through 1073:4424336 1045:Bibcode 1022:4363255 994:Bibcode 908:Science 797:de novo 750:on the 409:, and p 225:Here, P 78:leucine 1118:  1110:  1071:  1063:  1037:Nature 1020:  1012:  986:Nature 969:  961:  926:  891:  431:i | δj 88:, 40% 74:valine 1116:S2CID 1069:S2CID 1018:S2CID 967:S2CID 1108:PMID 1061:PMID 1010:PMID 959:PMID 924:PMID 889:PMID 777:and 758:and 752:GPCR 729:and 721:and 435:i, j 222:. 112:and 1100:doi 1053:doi 1041:437 1002:doi 990:437 951:doi 916:doi 912:286 879:doi 794:In 429:(ΔG 21:SCA 19:or 1140:: 1114:. 1106:. 1096:71 1094:. 1090:. 1067:. 1059:. 1051:. 1039:. 1016:. 1008:. 1000:. 988:. 965:. 957:. 947:10 945:. 922:. 910:. 887:. 875:20 873:. 869:. 791:. 766:. 754:, 733:. 660:ln 625:ln 180:ln 120:. 108:, 96:, 1102:: 1075:. 1055:: 1047:: 1024:. 1004:: 996:: 973:. 953:: 930:. 918:: 895:. 881:: 854:. 731:j 727:l 723:j 719:i 715:l 711:i 707:j 686:2 682:) 676:x 671:i 667:P 652:x 647:j 640:| 636:i 632:P 622:( 617:x 607:= 602:t 599:a 596:t 593:s 588:j 585:, 582:i 578:G 549:t 546:a 543:t 540:s 535:i 531:G 519:t 516:a 513:t 510:s 505:j 498:| 494:i 490:G 483:= 478:t 475:a 472:t 469:s 464:j 461:, 458:i 454:G 427:j 423:i 419:i 415:x 411:x 407:i 403:x 399:x 394:, 378:x 374:n 367:N 363:) 357:x 353:p 346:1 343:( 336:x 332:n 326:x 322:p 315:! 312:) 307:x 303:n 296:N 293:( 290:! 285:x 281:n 275:! 272:N 266:= 261:x 256:i 252:P 235:i 231:x 227:i 206:2 202:) 196:x 191:i 187:P 177:( 172:x 162:= 157:t 154:a 151:t 148:s 143:i 139:G 125:i 114:l 110:j 106:i 102:l 98:k 82:j 70:i 66:z 62:a 45:j 41:i

Index

bioinformatics
covariation
amino acids
multiple sequence alignment
valine
leucine
isoleucine
histidine
methionine
binomial form
PDZ domain
Ranganathan group
GPCR
serine protease
hemoglobin
allosteric communication
WW domain
thermodynamic stability
structure
sequence identity
ligand binding affinity and specificity
de novo protein structure prediction
Mutual information
What is a WW domain?
Ranganathan lecture on statistical coupling analysis (audio included)
Protein folding — a step closer?
"Supplementary Material for 'Evolutionarily conserved networks of residues mediate allosteric communication in proteins.'"
"A perturbation-based method for calculating explicit likelihood of evolutionary co-variance in multiple sequence alignments"
doi
10.1093/bioinformatics/bth128

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.