Statistical coupling analysis

745:

family, they were able to identify a small network of residues that were energetically coupled to a binding site residue. The network consisted of both residues spatially close to the binding site in the tertiary fold, called contact pairs, and more distant residues that participate in longer-range

781:

to natural WW domains. The fact that 12 out of the 43 designed proteins with the same SCA profile as natural WW domains properly folded provided strong evidence that little information—only coupling information—was required for specifying the protein fold. This support for the SCA hypothesis was

700: 704:

Statistical coupling energy is often systematically calculated between a fixed, perturbated position, and all other positions in an MSA. Continuing with the example MSA from the beginning of the section, consider a perturbation at position

561: 59:

Statistical coupling energy measures how a perturbation of amino acid distribution at one site in an MSA affects the amino acid distribution at another site. For example, consider a multiple sequence alignment with sites (or columns)

392: 786:

to natural WW folds, and b) none of the artificial proteins designed without coupling information folded properly. An accompanying study showed that the artificial WW domains were functionally similar to natural WW domains in

220: 825: 801:, it has been shown that, when combined with a simple residue-residue distance metric, SCA-based scoring can fairly accurately distinguish native from non-native protein folds. 566: 442: 941:

Suel; Lockless, SW; Wall, MA; Ranganathan, R; et al. (2003). "Evolutionarily conserved networks of residues mediate allosteric communication in proteins".

246: 984:

Socolich; Lockless, SW; Russ, WP; Lee, H; Gardner, KH; Ranganathan, R; et al. (2005). "Evolutionary information for specifying a protein fold".

130: 1087: 709:

where the amino distribution changes from 40% I, 40% H, 20% M to 100% I. If, in a subsequent subalignment, this changes the distribution at

1088:"Using scores derived from statistical coupling analysis to distinguish correct and incorrect folds in de-novo protein structure prediction" 769:

Statistical coupling analysis has also been used as a basis for computational protein design. In 2005, Socolich et al. used an SCA for the

741:

Ranganathan and Lockless originally developed SCA to examine thermodynamic (energetic) coupling of residue pairs in proteins. Using the

51:

indicates the degree of evolutionary dependence between the residues, with higher coupling energy corresponding to increased dependence.

795: 116:

have an amino acid distribution different from the mean distribution observed in all proteins, they are said to have some degree of

1035:

Russ; Lowery, DM; Mishra, P; Yaffe, MB; Ranganathan, R; et al. (2005). "Natural-like function in artificial WW domains".

788: 867:"A perturbation-based method for calculating explicit likelihood of evolutionary co-variance in multiple sequence alignments" 852:"Supplementary Material for 'Evolutionarily conserved networks of residues mediate allosteric communication in proteins.'" 906:

Lockless SW, Ranaganathan R (1999). "Evolutionarily conserved pathways of energetic connectivity in protein families".

100:

has an average distribution (the 20 amino acids are present at roughly the same frequencies seen in all proteins), and

774: 1123: 695:{\displaystyle \Delta \Delta G_{i,j}^{stat}={\sqrt {\sum _{x}(\ln P_{i|\delta j}^{x}-\ln P_{i}^{x})^{2}}}} 36: 417:

in all positions among all sequenced proteins. The summation runs over all 20 amino acids. After ΔG

830: 238: 1143: 782:

made more compelling considering that a) the successfully folded proteins had only 36% average

39:(MSA). More specifically, it quantifies how much the amino acid distribution at some position 833:- A summary of the Ranganathan lab's SCA-based design of artificial yet functional WW domains. 820: 556:{\displaystyle \Delta \Delta G_{i,j}^{stat}=\Delta G_{i|\delta j}^{stat}-\Delta G_{i}^{stat}} 1044: 993: 8: 762:

families also showed energetic coupling in sparse networks of residues that cooperate in

1048: 997: 387:{\displaystyle P_{i}^{x}={\frac {N!}{n_{x}!(N-n_{x})!}}p_{x}^{n_{x}}(1-p_{x})^{N-n_{x}}} 1115: 1068: 1017: 966: 809: 783: 1107: 1060: 1009: 958: 923: 888: 778: 763: 1119: 970: 883: 866: 713:

from 60% V, 40% L to 90% V, 10% L, but does not change the distribution at position

1099: 1072: 1052: 1021: 1001: 950: 915: 878: 919: 755: 43:

changes upon a perturbation of the amino acid distribution at another position

24: 425:

in a subalignment produced after a perturbation of amino acid distribution at

1137: 851: 1111: 1064: 1013: 962: 927: 892: 215:{\displaystyle \Delta G_{i}^{stat}={\sqrt {\sum _{x}(\ln P_{i}^{x})^{2}}}} 717:, then there would be some amount of statistical coupling energy between 32: 28: 1056: 1005: 1103: 759: 742: 93: 85: 123:

In statistical coupling analysis, the conservation (ΔG) at each site (

826:

Ranganathan lecture on statistical coupling analysis (audio included)

770: 89: 68:, where each site has some distribution of amino acids. At position 954: 77: 437:, is simply the difference between these two values. That is: 73: 865:

Dekker; Fodor, A; Aldrich, RW; Yellen, G; et al. (2004).

747: 54: 940: 905: 751: 746:

energetic interactions. Later applications of SCA by the

413:

corresponds to the approximate distribution of amino acid

864: 983: 1085: 1034: 569: 445: 433:) is taken. Statistical coupling energy, denoted ΔΔG 249: 133: 694: 555: 386: 214: 1135: 229:describes the probability of finding amino acid 104:has 80% histidine, 20% valine. Since positions 401:is the percentage of sequences with residue 773:to create artificial proteins with similar 421:is computed, the conservation for position 76:and the remaining 40% of sequences have a 882: 55:Definition of statistical coupling energy 789:ligand binding affinity and specificity 1136: 13: 573: 570: 526: 485: 449: 446: 237:, and is defined by a function in 134: 14: 1155: 814: 831:Protein folding — a step closer? 1086:Bartlett GJ, Taylor WR (2008). 736: 1079: 1028: 977: 934: 899: 858: 844: 681: 639: 621: 497: 405:(e.g. methionine) at position 362: 342: 311: 292: 201: 176: 72:, 60% of the sequences have a 1: 884:10.1093/bioinformatics/bth128 837: 439: 243: 17:Statistical coupling analysis 920:10.1126/science.286.5438.295 799:protein structure prediction 7: 804: 49:statistical coupling energy 37:multiple sequence alignment 10: 1160: 943:Nature Structural Biology 764:allosteric communication 84:the distribution is 40% 775:thermodynamic stability 23:is a technique used in 696: 557: 388: 216: 697: 563:, or, more commonly, 558: 389: 217: 821:What is a WW domain? 567: 443: 247: 131: 1057:10.1038/nature03990 1049:2005Natur.437..579R 1006:10.1038/nature03991 998:2005Natur.437..512S 679: 655: 605: 552: 522: 481: 341: 264: 199: 160: 1104:10.1002/prot.21779 810:Mutual information 692: 665: 630: 620: 576: 553: 529: 488: 452: 384: 320: 250: 212: 185: 175: 137: 1043:(7058): 579–583. 992:(7058): 512–518. 914:(5438): 295–299. 877:(10): 1565–1572. 784:sequence identity 748:Ranganathan group 725:but none between 690: 611: 397:where N is 100, n 318: 210: 166: 127:) is defined as: 47:. The resulting 31:between pairs of 1151: 1128: 1127: 1122:. Archived from 1083: 1077: 1076: 1032: 1026: 1025: 981: 975: 974: 938: 932: 931: 903: 897: 896: 886: 862: 856: 855: 848: 701: 699: 698: 693: 691: 689: 688: 678: 673: 654: 649: 642: 619: 610: 604: 590: 562: 560: 559: 554: 551: 537: 521: 507: 500: 480: 466: 393: 391: 390: 385: 383: 382: 381: 380: 360: 359: 340: 339: 338: 328: 319: 317: 310: 309: 288: 287: 277: 269: 263: 258: 221: 219: 218: 213: 211: 209: 208: 198: 193: 174: 165: 159: 145: 1159: 1158: 1154: 1153: 1152: 1150: 1149: 1148: 1134: 1133: 1132: 1131: 1084: 1080: 1033: 1029: 982: 978: 939: 935: 904: 900: 863: 859: 850: 849: 845: 840: 817: 807: 756:serine protease 739: 702: 684: 680: 674: 669: 650: 638: 634: 615: 609: 591: 580: 568: 565: 564: 538: 533: 508: 496: 492: 467: 456: 444: 441: 440: 436: 432: 420: 412: 400: 395: 376: 372: 365: 361: 355: 351: 334: 330: 329: 324: 305: 301: 283: 279: 278: 270: 268: 259: 254: 248: 245: 244: 228: 204: 200: 194: 189: 170: 164: 146: 141: 132: 129: 128: 57: 12: 11: 5: 1157: 1147: 1146: 1144:Bioinformatics 1130: 1129: 1126:on 2012-12-17. 1098:(1): 950–959. 1078: 1027: 976: 955:10.1038/nsb881 933: 898: 871:Bioinformatics 857: 842: 841: 839: 836: 835: 834: 828: 823: 816: 815:External links 813: 806: 803: 738: 735: 687: 683: 677: 672: 668: 664: 661: 658: 653: 648: 645: 641: 637: 633: 629: 626: 623: 618: 614: 608: 603: 600: 597: 594: 589: 586: 583: 579: 575: 572: 550: 547: 544: 541: 536: 532: 528: 525: 520: 517: 514: 511: 506: 503: 499: 495: 491: 487: 484: 479: 476: 473: 470: 465: 462: 459: 455: 451: 448: 434: 430: 418: 410: 398: 379: 375: 371: 368: 364: 358: 354: 350: 347: 344: 337: 333: 327: 323: 316: 313: 308: 304: 300: 297: 294: 291: 286: 282: 276: 273: 267: 262: 257: 253: 226: 207: 203: 197: 192: 188: 184: 181: 178: 173: 169: 163: 158: 155: 152: 149: 144: 140: 136: 80:, at position 56: 53: 25:bioinformatics 9: 6: 4: 3: 2: 1156: 1145: 1142: 1141: 1139: 1125: 1121: 1117: 1113: 1109: 1105: 1101: 1097: 1093: 1089: 1082: 1074: 1070: 1066: 1062: 1058: 1054: 1050: 1046: 1042: 1038: 1031: 1023: 1019: 1015: 1011: 1007: 1003: 999: 995: 991: 987: 980: 972: 968: 964: 960: 956: 952: 948: 944: 937: 929: 925: 921: 917: 913: 909: 902: 894: 890: 885: 880: 876: 872: 868: 861: 853: 847: 843: 832: 829: 827: 824: 822: 819: 818: 812: 811: 802: 800: 798: 792: 790: 785: 780: 776: 772: 767: 765: 761: 757: 753: 749: 744: 734: 732: 728: 724: 720: 716: 712: 708: 685: 675: 670: 666: 662: 659: 656: 651: 646: 643: 635: 631: 627: 624: 616: 612: 606: 601: 598: 595: 592: 587: 584: 581: 577: 548: 545: 542: 539: 534: 530: 523: 518: 515: 512: 509: 504: 501: 493: 489: 482: 477: 474: 471: 468: 463: 460: 457: 453: 438: 428: 424: 416: 408: 404: 377: 373: 369: 366: 356: 352: 348: 345: 335: 331: 325: 321: 314: 306: 302: 298: 295: 289: 284: 280: 274: 271: 265: 260: 255: 251: 242: 241:as follows: 240: 239:binomial form 236: 232: 223: 205: 195: 190: 186: 182: 179: 171: 167: 161: 156: 153: 150: 147: 142: 138: 126: 121: 119: 115: 111: 107: 103: 99: 95: 91: 87: 83: 79: 75: 71: 67: 63: 52: 50: 46: 42: 38: 35:in a protein 34: 30: 26: 22: 18: 1124:the original 1095: 1091: 1081: 1040: 1036: 1030: 989: 985: 979: 949:(1): 59–69. 946: 942: 936: 911: 907: 901: 874: 870: 860: 846: 808: 796: 793: 768: 740: 737:Applications 730: 726: 722: 718: 714: 710: 706: 703: 426: 422: 414: 406: 402: 396: 234: 233:at position 230: 224: 124: 122: 118:conservation 117: 113: 109: 105: 101: 97: 81: 69: 65: 61: 58: 48: 44: 40: 20: 16: 15: 33:amino acids 29:covariation 27:to measure 838:References 760:hemoglobin 743:PDZ domain 94:methionine 86:isoleucine 779:structure 771:WW domain 663:⁡ 657:− 644:δ 628:⁡ 613:∑ 574:Δ 571:Δ 527:Δ 524:− 502:δ 486:Δ 450:Δ 447:Δ 370:− 349:− 299:− 183:⁡ 168:∑ 135:Δ 90:histidine 1138:Category 1120:33836866 1112:18004776 1092:Proteins 1065:16177795 1014:16177782 971:67749580 963:12483203 928:10514373 893:14962924 805:See also 92:and 20% 64:through 1073:4424336 1045:Bibcode 1022:4363255 994:Bibcode 908:Science 797:de novo 750:on the 409:, and p 225:Here, P 78:leucine 1118: 1110: 1071: 1063: 1037:Nature 1020: 1012: 986:Nature 969: 961: 926: 891: 431:i | δj 88:, 40% 74:valine 1116:S2CID 1069:S2CID 1018:S2CID 967:S2CID 1108:PMID 1061:PMID 1010:PMID 959:PMID 924:PMID 889:PMID 777:and 758:and 752:GPCR 729:and 721:and 435:i, j 222:. 112:and 1100:doi 1053:doi 1041:437 1002:doi 990:437 951:doi 916:doi 912:286 879:doi 794:In 429:(ΔG 21:SCA 19:or 1140:: 1114:. 1106:. 1096:71 1094:. 1090:. 1067:. 1059:. 1051:. 1039:. 1016:. 1008:. 1000:. 988:. 965:. 957:. 947:10 945:. 922:. 910:. 887:. 875:20 873:. 869:. 791:. 766:. 754:, 733:. 660:ln 625:ln 180:ln 120:. 108:, 96:, 1102:: 1075:. 1055:: 1047:: 1024:. 1004:: 996:: 973:. 953:: 930:. 918:: 895:. 881:: 854:. 731:j 727:l 723:j 719:i 715:l 711:i 707:j 686:2 682:) 676:x 671:i 667:P 652:x 647:j 640:| 636:i 632:P 622:( 617:x 607:= 602:t 599:a 596:t 593:s 588:j 585:, 582:i 578:G 549:t 546:a 543:t 540:s 535:i 531:G 519:t 516:a 513:t 510:s 505:j 498:| 494:i 490:G 483:= 478:t 475:a 472:t 469:s 464:j 461:, 458:i 454:G 427:j 423:i 419:i 415:x 411:x 407:i 403:x 399:x 394:, 378:x 374:n 367:N 363:) 357:x 353:p 346:1 343:( 336:x 332:n 326:x 322:p 315:! 312:) 307:x 303:n 296:N 293:( 290:! 285:x 281:n 275:! 272:N 266:= 261:x 256:i 252:P 235:i 231:x 227:i 206:2 202:) 196:x 191:i 187:P 177:( 172:x 162:= 157:t 154:a 151:t 148:s 143:i 139:G 125:i 114:l 110:j 106:i 102:l 98:k 82:j 70:i 66:z 62:a 45:j 41:i

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

Knowledge

Statistical coupling analysis

Index