Abstract
A new method has been developed to compute the probability that each amino acid in a protein sequence is in a particular secondary structural element. Each of these probabilities is computed using the entire sequence and a set of predefined structural class models. This set of structural classes is patterned after Jane Richardson's taxonomy for the domains of globular proteins. For each structural class considered, a mathematical model is constructed to represent constraints on the pattern of secondary structural elements characteristic of that class. These are stochastic models having discrete state spaces (referred to as hidden Markov models by researchers in signal processing and automatic speech recognition). Each model is a mathematical generator of amino acid sequences; the sequence under consideration is modeled as having been generated by one model in the set of candidates. The probability that each model generated the given sequence is computed using a filtering algorithm. The protein is then classified as belonging to the structural class having the most probable model. The secondary structure of the sequence is then analyzed using a "smoothing" algorithm that is optimal for that structural class model. For each residue position in the sequence, the smoother computes the probability that the residue is contained within each of the defined secondary structural elements of the model. This method has two important advantages: (1) the probability of each residue being in each of the modeled secondary structural elements is computed using the totality of the amino acid sequence, and (2) these probabilities are consistent with prior knowledge of realizable domain folds as encoded in each model. As an example of the method's utility, we present its application to flavodoxin, a prototypical alpha/beta protein having a central beta-sheet, and to thioredoxin, which belongs to a similar structural class but shares no significant sequence similarity.
Full Text
The Full Text of this article is available as a PDF (2.1 MB).
Selected References
These references are in PubMed. This may not be the complete list of references from this article.
- Bairoch A., Boeckmann B. The SWISS-PROT protein sequence data bank. Nucleic Acids Res. 1992 May 11;20 (Suppl):2019–2022. doi: 10.1093/nar/20.suppl.2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Benner S. A., Cohen M. A., Gerloff D. Correct structure prediction? Nature. 1992 Oct 29;359(6398):781–781. doi: 10.1038/359781a0. [DOI] [PubMed] [Google Scholar]
- Bernstein F. C., Koetzle T. F., Williams G. J., Meyer E. F., Jr, Brice M. D., Rodgers J. R., Kennard O., Shimanouchi T., Tasumi M. The Protein Data Bank: a computer-based archival file for macromolecular structures. J Mol Biol. 1977 May 25;112(3):535–542. doi: 10.1016/s0022-2836(77)80200-3. [DOI] [PubMed] [Google Scholar]
- Bowie J. U., Lüthy R., Eisenberg D. A method to identify protein sequences that fold into a known three-dimensional structure. Science. 1991 Jul 12;253(5016):164–170. doi: 10.1126/science.1853201. [DOI] [PubMed] [Google Scholar]
- Chothia C. Hydrophobic bonding and accessible surface area in proteins. Nature. 1974 Mar 22;248(446):338–339. doi: 10.1038/248338a0. [DOI] [PubMed] [Google Scholar]
- Chou P. Y., Fasman G. D. Prediction of protein conformation. Biochemistry. 1974 Jan 15;13(2):222–245. doi: 10.1021/bi00699a002. [DOI] [PubMed] [Google Scholar]
- Eisenberg D., McLachlan A. D. Solvation energy in protein folding and binding. Nature. 1986 Jan 16;319(6050):199–203. doi: 10.1038/319199a0. [DOI] [PubMed] [Google Scholar]
- Garnier J., Osguthorpe D. J., Robson B. Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J Mol Biol. 1978 Mar 25;120(1):97–120. doi: 10.1016/0022-2836(78)90297-8. [DOI] [PubMed] [Google Scholar]
- Hobohm U., Scharf M., Schneider R., Sander C. Selection of representative protein data sets. Protein Sci. 1992 Mar;1(3):409–417. doi: 10.1002/pro.5560010313. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kabsch W., Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983 Dec;22(12):2577–2637. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]
- Katti S. K., LeMaster D. M., Eklund H. Crystal structure of thioredoxin from Escherichia coli at 1.68 A resolution. J Mol Biol. 1990 Mar 5;212(1):167–184. doi: 10.1016/0022-2836(90)90313-B. [DOI] [PubMed] [Google Scholar]
- Musacchio A., Noble M., Pauptit R., Wierenga R., Saraste M. Crystal structure of a Src-homology 3 (SH3) domain. Nature. 1992 Oct 29;359(6398):851–855. doi: 10.1038/359851a0. [DOI] [PubMed] [Google Scholar]
- Nishikawa K., Kubota Y., Ooi T. Classification of proteins into groups based on amino acid composition and other characters. I. Angular distribution. J Biochem. 1983 Sep;94(3):981–995. doi: 10.1093/oxfordjournals.jbchem.a134442. [DOI] [PubMed] [Google Scholar]
- Ponder J. W., Richards F. M. Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes. J Mol Biol. 1987 Feb 20;193(4):775–791. doi: 10.1016/0022-2836(87)90358-5. [DOI] [PubMed] [Google Scholar]
- Richardson J. S. The anatomy and taxonomy of protein structure. Adv Protein Chem. 1981;34:167–339. doi: 10.1016/s0065-3233(08)60520-3. [DOI] [PubMed] [Google Scholar]
- Smith R. F., Smith T. F. Automatic generation of primary sequence patterns from sets of related protein sequences. Proc Natl Acad Sci U S A. 1990 Jan;87(1):118–122. doi: 10.1073/pnas.87.1.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith W. W., Burnett R. M., Darling G. D., Ludwig M. L. Structure of the semiquinone form of flavodoxin from Clostridum MP. Extension of 1.8 A resolution and some comparisons with the oxidized state. J Mol Biol. 1977 Nov 25;117(1):195–225. doi: 10.1016/0022-2836(77)90031-6. [DOI] [PubMed] [Google Scholar]
- Zhang C. T., Chou K. C. An optimization approach to predicting protein structural class from amino acid composition. Protein Sci. 1992 Mar;1(3):401–408. doi: 10.1002/pro.5560010312. [DOI] [PMC free article] [PubMed] [Google Scholar]