Skip to main content
The Journal of Biological Chemistry logoLink to The Journal of Biological Chemistry
. 2008 May 2;283(18):12604–12613. doi: 10.1074/jbc.M709865200

Divergent Modes of Glycan Recognition by a New Family of Carbohydrate-binding Modules*,S⃞,

Katie J Gregg 1,1, Ron Finn 1, D Wade Abbott 1, Alisdair B Boraston 1,2
PMCID: PMC2335362  PMID: 18292090

Abstract

The genomes of myonecrotic Clostridium perfringens isolates contain genes encoding a large and fascinating array of highly modular glycoside hydrolase enzymes. Although the catalytic activities of many of these enzymes are somewhat predictable based on their amino acid sequences, the functions of their abundant ancillary modules are not and remain poorly studied. Here, we present the structural and functional analysis of a new family of ancillary carbohydrate-binding modules (CBMs), CBM51, which was previously annotated in data bases as the novel putative CBM domain. The high resolution crystal structures of two CBM51 members, GH95CBM51 and GH98CBM51, from a putative family 95 α-fucosidase and from a family 98 blood group A/B antigen-specific endo-β-galactosidase, respectively, showed them to have highly similar β-sandwich folds. However, GH95CBM51 was shown by glycan microarray screening, isothermal titration calorimetry, and x-ray crystallography to bind galactose residues, whereas the same analyses of GH98CBM51 revealed specificity for the blood group A/B antigens through non-conserved interactions. Overall, this work identifies a new family of CBMs with many members having apparent specificity for eukaryotic glycans, in keeping with the glycan-rich environment C. perfringens would experience in its host. However, a wider bioinformatic analysis of this CBM family also indicated a large number of members in non-pathogenic environmental bacteria, suggesting a role in the recognition of environmental glycans.


Carbohydrates have critical functions in numerous biological events, including, for example, the movement and interactions of cells and proteins in animals, the recycling of plant cell wall carbohydrates, and the interactions between hosts and disease-causing organisms. Central to the role of carbohydrates in biological processes are protein-carbohydrate interactions. Non-catalytic carbohydrate-binding proteins (e.g. lectins, antibodies, and transport proteins) and catalytic carbohydrate-active enzymes are finely tuned to recognize particular carbohydrate structural motifs. The information content of glycans is realized through the specificity of non-catalytic carbohydrate-binding proteins (like lectins and antibodies), whereas carbohydrate-active enzymes change the information content and often unlock the energy contained within these molecules.

Carbohydrate-binding modules (CBMs)3 are a comparatively new class of non-catalytic carbohydrate-recognizing polypeptide that are generally defined by their presence as ancillary modules in larger, multimodular carbohydrate-active enzymes such as glycoside hydrolases, glycosyltransferases, and polysaccharide lyases (1). In the context of these enzymes, the role of CBMs is to specifically bind the carbohydrate substrate and hold the enzyme in proximity to the substrate, allowing catalysis to proceed more efficiently (2). The number of CBM families, which are defined on the basis of amino acid sequence similarity, has grown to the current number of 50 (www.cazy.org). Although the majority of these families are known to have members that recognize primarily plant cell wall polysaccharides, a growing number of families appear to have specificity for animal glycans and appear within the modular structures of carbohydrate-active bacterial virulence factors (3-6). In these cases, the CBMs appear to have a role in recognizing the information content of cellular glycans, which then allows the enzymatic activity of the virulence factor to be appropriately directed. One of the current challenges in this area, which is complicated by the great diversity of eukaryotic glycans, is to determine the specificity, strength, and molecular determinants of these CBM-glycan interactions. In turn, this provides key information about the cellular targets of the entire carbohydrate-active virulence factor, thus providing greater insight into the host-pathogen interaction.

Clostridium perfringens is a ubiquitous Gram-positive bacterium that is capable of causing an array of diseases such as gastroenteritis and gas gangrene in humans and animals. The genus Clostridium (and the species perfringens in particular) is notable for its prolific production of toxins that contribute to its virulence (7, 8). Among its minor toxins are a battery of carbohydrate-active enzymes of the class glycoside hydrolase, which are thought to aid in major toxin delivery, tissue destruction, and nutrient harvesting while in the host. The two myonecrotic strains of C. perfringens for which the genome sequences are available are remarkable for their content of highly modular glycoside hydrolases (9, 10). Each strain contains in excess of 50 open reading frames encoding putative glycoside hydrolases, many of which are extracellular and rarely smaller than 1000 amino acids in length and frequently contain, in addition to the catalytic module, three or more definable ancillary modules (3). These ancillary modules include CBMs (3, 4), putative protein-protein interaction domains (11), and modules of unknown function. One such unknown module, which occurs quite frequently in the C. perfringens glycoside hydrolases as well as in proteins from a variety of other organisms, is the novel putative CBM (NPCBM) domain. NPCBM domains show no amino acid sequence identity to known carbohydrate-binding proteins but were hypothesized by Rigden (12) to be CBMs based on their context (i.e. in carbohydrate-active enzymes) and their proposed β-sandwich fold, which is common to CBMs. To test the hypothesized carbohydrate-binding function of these modules, we performed structural and functional studies on the NPCBMs from two different multimodular clostridial enzymes, one a hypothesized family 95 α-l-fucosidase (CPF_2129 from C. perfringens strain ATCC 13124) and the other a confirmed family 98 blood group A/B antigen-specific endo-β-d-galactosidase (EabC from C. perfringens strain ATCC 10543) (Fig. 1) (13). The two targeted modules share only ∼30% amino acid sequence identity. The results confirm the identification of NPCBMs as a new CBM family, now classified as CBM51, and define a new module that can mediate the carbohydrate-based interaction of the enzymes with host tissues. Furthermore, an unexpected degree of diversity in specificity and binding site architecture is evident in this new CBM family, providing some distinction from other CBM families.

FIGURE 1.

FIGURE 1.

Schematics of the modular arrangements of the GH95 α-l-fucosidase (CPF_2129 from C. perfringens strain ATCC 13124) (A) and the GH98 blood group A/B antigen-specific endo-β-d-galactosidase (EabC from C. perfringens strain ATCC 10543) (B). DOC refers to a module that has amino acid sequence identity to dockerins; UNK refers to unclassified modules of unknown function.

EXPERIMENTAL PROCEDURES

Cloning, Gene Expression, and Protein Production and Purification—The gene fragments encoding GH95CBM51 and GH98CBM51 were PCR-amplified from the genomic DNA of C. perfringens strains ATCC 13124 and ATCC 10543, respectively, using the following sets of primers: GH95CBMF (5′-CAT ATG GCT AGC GAA AAG GTT GCA GTT G-3′) and GH95CBMR (5′-GAA TTC CTC GAG TTA TGT TAA CTT AGC G-3′) for GH95CBM51 and GH98CBMF (5′-CAT ATG GCT AGC GAA GTT TAT GCT TTG GAA GAA AGC G-3′) and GH98CBMR (5′-GAA TTC CTC GAG TTA ATT CAC AAA ATC ACC CTT AGC TGT C-3′) for GH98CBM51. The amplified products were digested with NheI and XhoI restriction endonucleases and ligated to like digested pET-28a using standard cloning procedures. The resulting plasmids, pGH95CBM and pGH98CBM, encode the desired CBM fused to an N-terminal His6 tag by a thrombin protease cleavage site.

Polypeptides were produced in 4-liter cultures of Escherichia coli strain BL21(DE3) containing pGH95CBM and pGH98CBM using the methods described previously (3). GH95CBM51 and GH98CBM51 were purified by immobilized metal affinity chromatography from cell-free extracts following previously described procedures (3). Purified polypeptides were concentrated and exchanged into 20 mm Tris-HCl (pH 8.0) in a stirred ultrafiltration unit (Amicon, Beverly, MA) using a 5-kDa cutoff membrane (Filtron Corp., Northborough, MA). Purity as assessed by SDS-PAGE was >95%.

Selenomethionine-labeled GH95CBM51 was produced using the E. coli B834(DE3) methionine auxotroph. E. coli colonies taken from an LB-agar plate were used to inoculate 3 liters of SelenoMet Medium Base (Molecular Dimensions Ltd.) supplemented with SelenoMet Nutrient Mix (Molecular Dimensions Ltd.) and l-selenomethionine (40 mg/liter). These cultures were grown, induced, and harvested, and the polypeptide was purified exactly as described for the unlabeled protein.

Determination of Protein Concentration—The concentration of purified protein was determined by UV absorbance (280 nm) using calculated molar extinction coefficients of 17,420 and 24,410 m-1 cm-1 for GH95CBM51 and GH98CBM51, respectively.

Binding Studies—Proteins for glycan array screening were labeled with fluorescein isothiocyanate (Invitrogen) according to the manufacturer's directions. Labeled protein was desalted and separated from free fluorescein isothiocyanate by gel filtration chromatography using Sephadex G-25 (GE Healthcare). Fluorescein isothiocyanate-labeled GH95CBM51 and GH98CBM51 were used to probe the printed glycan arrays following the standard procedure of Core H of the Consortium for Functional Glycomics (www.functionalglycomics.org/).

Isothermal titration calorimetry was performed as described previously using a VP-ITC system (MicroCal, Northampton, MA) (3). Protein samples were dialyzed extensively against 50 mm Tris-HCl (pH 7.5) and 1 mm CaCl2 and then concentrated in a stirred ultrafiltration cell as described above. Sugar solutions were prepared by mass in buffer saved from the ultrafiltration step. Both protein and sugar solutions were filtered and degassed immediately prior to use. Protein concentrations were determined by UV absorbance as described above. Although the concentrations of acceptor (i.e. GH95CBM51 or GH98CBM51) were quite high (between 200 and 500 μm), the low affinities of the interactions resulted in C values <2 (14). Thus, based on the 1:1 binding observed in the crystal structures, the data were fit with a single binding site model using stoichiometries fixed at 1. The association constants (Ka) were determined by this fitting process.

Crystallization, Data Collection, and Structure Solution—GH95CBM51 and GH98CBM51, previously exchanged into 20 mm Tris (pH 8.0), were treated overnight at room temperature with thrombin to remove the His6 tag. The polypeptides were separated from the cleaved His6 tag by size exclusion chromatography using a Sephacryl S-200 column (GE Healthcare). Pure fractions were concentrated in a 10-ml stirred ultrafiltration device using a 5-kDa cutoff membrane. All crystallizations were performed by the hanging drop vapor diffusion method at 18 °C. Diffraction data for all crystals except the selenomethionine-substituted GH95CBM51 crystals were collected on our home source: a Rigaku R-AXIS 4++ area detector coupled to an MM-002 x-ray generator with Osmic Blue optics and an Oxford 700 series Cryostream. All data were processed using Crystal Clear/d*trek (15). Data collection and processing statistics are given in Table 1.

TABLE 1.

X-ray data collection and refinement statistics

SeMet, selenomethionine; r.m.s., root mean square.

GH98CBM51
GH95CBM51
Unbound A trisaccharide B trisaccharide SeMet (unbound) Unbound Methyl-β-d-Gal
Data collection
Space group P212121 P21212 P21212 P65 P65 P21
Cell dimensions
a, b, c (Å) 50.72, 55.48, 65.68 69.68, 98.75, 49.24 69.68, 98.75, 49.24 76.72, 76.72, 51.77 76.78, 76.78, 51.82 37.47, 47.95, 38.70
α, β, γ 90°, 90°, 90° 90°, 90°, 90° 90°, 90°, 90° 90°, 90°, 120° 90°, 90°, 120° 90°, 92°, 90°
Resolution (Å) 20.00-1.55 (1.61-1.55) 20.00-1.40 (1.45-1.40) 20.00-1.45 (1.50-1.45) 20.00-1.70 (1.76-1.70) 20.00-1.50 (1.55-1.50) 20.00-1.90 (1.97-1.90)
Rmerge 0.055 (0.364) 0.035 (0.316) 0.043 (0.349) 0.138 (0.332) 0.059 (0.398) 0.077 (0.299)
II 13.3 (2.7) 18.1 (2.7) 14.6 (2.9) 15.5 (6.5) 13.4 (3.4) 8.6 (2.5)
Completeness (%) 97.0 (99.4) 89.6 (76.3) 95.6 (97.7) 99.9 (98.6) 99.8 (99.9) 96.4 (93.2)
Redundancy 4.0 (3.3) 3.4 (2.2) 3.9 (3.2) 19.8 (10.3) 6.5 (5.2) 3.7 (3.6)
Refinement
Resolution (Å) 20.00-1.55 20.00-1.40 20.00-1.45 20.00-1.70 20.00-1.50 20.00-1.90
No. reflections 25,390 57,807 55,064 18,188 26,520 10,010
B-factor model Isotropic Anisotropic Anisotropic Isotropic Anisotropic Isotropic
Rwork/Rfree 19.7/23.4 18.6/23.5 18.2/22.6 15.0/17.8 14.8/18.4 23.3/29.5
No. atmos
Protein 1344 1356 (A), 1372 (B) 1347 (A), 1366 (B) 1109 1116 1100
Ligand/ion 1 Ca2+ 2 Ca2+, 72 sugar 2 Ca2+, 72 sugar 1 Ca2+ 1 Ca2+ 1 Ca2+, 13 sugar
Water 256 585 576 132 162 146
B-factors
Protein 17.4 16.7 (A), 17.0 (B) 15.8 (A), 16.0 (B) 11.8 15.0 23.8
Ligand/ion 13.0 16.1 (Ca2+), 26.5 (sugar) 13.5 (Ca2+), 18.4 (sugar) 7.5 11.4 19.7 (Ca2+), 18.8 (sugar)
Water 30.7 30.8 28.4 23.2 28.8 30.6
r.m.s. deviations
Bond lengths (Å) 0.020 0.022 0.022 0.020 0.019 0.007
Bond angles 1.695° 1.881° 1.821° 1.602° 1.603° 0.937°

Crystals of native and selenomethionine-labeled GH95CBM51 (both at 15 mg/ml) were grown in 0.05 m calcium chloride, 0.1 m sodium acetate (pH 4.6), and 20% polyethylene glycol 3350. An optimized selenium single anomalous dispersion diffraction data set for selenomethionine-labeled GH95CBM51 was collected on beamline X6A at the National Synchrotron Light Source (Brookhaven National Laboratories). SHELXC/D was used to determine the substructure of three selenium atoms, followed by refinement and phasing with SHARP (16). Solvent flattening with DM resulted in easily interpretable electron density maps (17). Automatic model building with ARP/wARP yielded a virtually complete model that was finished by manual model building using COOT and refinement with REFMAC (18-20). This selenium-substituted model was used as a starting point for the higher resolution native structure. GH95CBM51 at 45 mg/ml was co-crystallized with 50 mm methyl-β-d-galactose in 0.2 m magnesium acetate, 0.1 m sodium acetate (pH 4.8), and 16% polyethylene glycol 3350. This structure was solved by molecular replacement using MOLREP (21) and the native GH95CBM51 model as a template.

Crystals of GH98CBM51 at 15 mg/ml were grown in 0.2 m magnesium acetate, 0.1 m HEPES (pH 7.5), and 20% polyethylene glycol 3350. This structure was solved by molecular replacement using the coordinates of GH95CBM51 as a search model and MOLREP to find the one molecule in the asymmetric unit. After manual correction of this model, ARP/wARP was able to build a model that required minimum alteration using COOT and refinement with REFMAC. A crystal form of GH98CBM51 obtained in 0.2 m sodium carbonate, 0.1 m HEPES (pH 7.5), and 20% polyethylene glycol 3350 was soaked with the blood group A and B antigen trisaccharides (Dextra Laboratories) at 5 mm in mother liquor to obtain complexes with these sugars. This process resulted in the alteration of the unit cell dimensions and space group from a = 36.42, b = 49.54, and c = 88.45 Å (P212121; structure not reported, as it did not yield any new information) to a = 69.68, b = 98.75, and c = 49.24 Å (P21212) but nevertheless yielded excellent complexes. The new asymmetric unit of the soaked crystals contained two molecules of GH98CBM51, each bound to carbohydrate, which were located with the molecular replacement program MOLREP. Building and refinement were carried as described above.

Water molecules were added to all models using the REFMAC implementation of ARP/wARP (18) and inspected visually prior to deposition. In all data sets, 5% of the observations were flagged as “free” (22) and used to monitor refinement procedures. All final model statistics are given in Table 1. Structure images were prepared with PyMOL.

Bioinformatic Analysis—Putative CBM51 domains were detected by position-specific iterative BLAST searches (23) using both GH95CBM51 and GH98CBM51 as queries. Polypeptides containing similar amino acid sequences were dissected for modularity by InterProScan (24) and classified when possible based on the predicted enzymatic specificity for an appended catalytic module. In some cases, boundary positions were fine-tuned by direct sequence alignments with ClustalW (25). For enzymes that contained tandem copies of CBM51, each module was treated independently. Sequence entries were cut off at >20% identity over a minimum of 100 amino acids. Phylogenetic tree construction was performed with PhyloDraw Version 0.8 (26).

RESULTS AND DISCUSSION

A New Family of Carbohydrate-binding Modules—To facilitate the characterization of NPCBMs in isolation, the gene fragments encoding the modules were cloned and overexpressed in E. coli, and the polypeptides were purified by immobilized metal affinity chromatography. The purified polypeptides, here called GH95CBM51 (from C. perfringens strain ATCC 13124 GH95) and GH98CBM51 (from C. perfringens strain ATCC 10543 GH98), were screened for glycan binding by Core H of the Consortium for Functional Glycomics using glycan microarrays. Both proteins displayed the ability to recognize glycans (Fig. 2). GH98CBM51 was clearly the more specific module, showing binding only to glycans bearing the blood group A or B antigen trisaccharide determinants (Fig. 2B). This specificity is in keeping with the blood group antigen specificity of GH98 (EabC) (13). In contrast, GH95CBM51 appeared to interact with a large number of the glycans presented in this array. The common determinant of the best hits was terminal galacto-configured sugars, either d-galactose or d-N-acetylgalactosamine (GalNAc) (Fig. 2A). The exceptions contained galactose following terminal α-2,6-linked sialic acid or α-1,2-linked fucose. GH95 is a predicted α-fucosidase; thus, the binding specificity of its CBM, GH95CBM51, is somewhat at odds with the predicted catalytic specificity. The biological significance of this is currently unknown.

FIGURE 2.

FIGURE 2.

Glycan microarray binding data for GH95CBM51 (A) and GH98CBM51 (B). Glycans giving substantial signals are indicated with their structures. Yellow symbols indicate galacto-configured monosaccharides; blue symbols indicate gluco-configured monosaccharides; and green symbols indicate manno-configured monosaccharides. Circles represent hexose monosaccharides, and squares indicate their 2-acetamido derivatives. Purple diamonds indicate sialic acid, and red triangles represent l-fucose.

The carbohydrate-binding activities of GH95CBM51 and GH98CBM51 were confirmed and quantified by isothermal titration calorimetry. GH95CBM51 bound Galβ1-3GalNAc and methyl-β-d-Gal with association constants of (1.48 ± 0.05) × 103 and (0.48 ± 0.04) × 103 m-1, respectively. GH98CBM51 bound the A antigen tetrasaccharide (Gal-NAcα1-3(Fucα1-2)Galβ1-4GlcNAcβ-CH2-CH2-N3) and B antigen tetrasaccharide (Galα1-3(Fucα1-2)Galβ1-4GlcNAcβ-CH2-CH2-N3) with association constants of (3.85 ± 0.25) × 103 and (1.30 ± 0.01) × 103 m-1, respectively. In general, these affinities are on the lower end of the affinity range for protein-carbohydrate interactions. The higher affinity of GH95CBM51 for the disaccharide ligand Galβ1-3GalNAc versus the methylated galactose monosaccharide suggests additional interactions with the second sugar residue. GH98CBM51 showed a slight (2-3-fold) preference for the blood group A antigen tetrasaccharide in comparison with the blood group B antigen tetrasaccharide. This is consistent with the smaller observed signal for the blood group B antigen-containing glycans in the microarray binding analysis (Fig. 2B). Overall, these results demonstrate that these two examples of NPCBM domains are indeed carbohydrate-binding modules, and this family of modules has now been reclassified as CBM51.

CBM51 Displays a Common Carbohydrate-binding Fold—We approached the question of what are the molecular determinants of specificity in these family 51 CBMs (NPCBMs) through x-ray crystallography. The x-ray crystal structures of GH95CBM51 and GH98CBM51 were solved by optimized selenium single anomalous dispersion and molecular replacement, respectively. Both modules adopt a β-sandwich fold comprising a five-stranded antiparallel β-sheet opposing a four-stranded antiparallel β-sheet (Fig. 3, A and B). The β-sandwich fold is common to numerous families of carbohydrate-binding proteins, including a number of CBM and lectin families (1). The atomic models of GH95CBM51 and GH98CBM51 are obviously similar and overlap with a root mean square deviation of 1.2 Å over 140 matched residues (measured by the secondary structure matching algorithm (27) as implemented in COOT). The primary difference in the structures of these two polypeptides is a small insertion in GH98CBM51, which results in the addition of a small α-helix near the binding site (Fig. 3B). Relative to GH95CBM51, this addition in GH98CBM51 appears to play a role in recontouring the binding site to accommodate the larger blood group A/B carbohydrates.

FIGURE 3.

FIGURE 3.

Fold and calcium binding of CBM51. Shown are color-ramped schematic representations of GH95CBM51 with methyl-β-d-galactose (A) and GH98CBM51 with the B antigen trisaccharide (B), both with bound calcium atoms shown as pink spheres and ligands in stick representation. C shows a schematic representation of CBM6-1 from Clostridium stercorarium (Protein Data Bank code 1uy4) (34), with its bound calcium atom shown as a pink sphere and its bound xylotetraose ligand shown in stick representation. This structure is a representative β-sandwich CBM showing the common calcium-binding site.

Both polypeptides bind a metal atom (Fig. 3, A and B) that was judged to be Ca2+ based on the strictly oxygen-mediated coordination, coordination bond lengths that range from 2.25 to 2.40 Å, a B-factor when modeled as Ca2+ that is consistent with the neighboring atoms, and a small peak of anomalous signal that overlaps with the position of this atom (data not shown). The position of this atom is conserved between GH95CBM51 and GH98CBM51 (within 0.6 Å in a structural overlap) but is not conserved in other CBM families. In the case of these family 51 CBMs, the metal-binding site is at the interface of the two β-sheets (the edge of the sandwich) and in the loops that join the termini of the β-strands, near the carbohydrate-binding site (Fig. 3, A and B). In contrast, the majority of other CBM families that bind metal atoms, which notably are all also β-sandwich proteins, do so at a site that is also at the edge of the sandwich but on a side such that the binding site is on the edge of the strands rather than at the ends (oFig. 3C).

GH95CBM51 Has a Simple Galactose-binding Site—A ligand-bound form of GH95CBM51 was obtained by co-crystallization of the protein with excess methyl-β-d-galactose. The electron density for this sugar was very clear, allowing unambiguous modeling of the ligand (Fig. 4A).

FIGURE 4.

FIGURE 4.

Carbohydrate-binding sites of GH95CBM51 and GH98CBM51. A, solvent-accessible surface representation of the GH95CBM51 binding site containing methyl-β-d-galactose. The sugar is shown in green stick representation with maximum likelihood (20)/σA (35) weighted 2Fo - Fc and Fo - Fc electron density maps contoured at 0.13 electrons/Å3 (1σ) and 0.08 electrons/Å3 (2.5σ), respectively. B, binding site showing the bound sugar in green stick representation and side chains involved in binding the sugar in blue stick representation. Potential hydrogen bonds are shown as purple dashed lines. C, overlay of the GH95CBM51 binding site (yellow) and the binding site of CpCBM32 (magenta) from C. perfringens GH84C (NagJ). Structurally conserved residues involved in binding are shown in stick representation. The numbering indicates the residues in CpCBM32. The residues shown for GH95CBM51 are the same as those indicated in B. D, solvent-accessible surface representation of the A antigen trisaccharide in the GH98CBM51 binding site. The sugar is shown in green stick representation with maximum likelihood (20)/σA (35) weighted 2Fo - Fc and Fo - Fc electron density maps contoured at 0.16 electrons/Å3 (0.9σ) and 0.09 electrons/Å3 (2.5σ), respectively. E, solvent-accessible surface representation of the B antigen trisaccharide in the GH98CBM51 binding site. Electron density maps are maximum likelihood/σA weighted 2Fo - Fc and Fo - Fc maps contoured at 0.16 electrons/Å3 (1.0σ) and 0.08 electrons/Å3 (2.5σ), respectively. F, binding site showing the bound A antigen trisaccharide in green stick representation and side chains involved in binding the sugar in blue stick representation. Potential hydrogen bonds are shown as purple dashed lines. The pattern of interactions is identical for the B antigen trisaccharide and is not shown.

The binding site of GH95CBM51 is quite shallow (Fig. 4A) and provides surprisingly few direct interactions with the sugar (Fig. 4B). The ε-nitrogens of His955 and His1041 make hydrogen bonds with the sugar O-4 and O-3, respectively (Fig. 4B). The galactose O-4 also makes a hydrogen bond with the backbone nitrogen of Ser1039 (Fig. 4B). The phenol group of Tyr922 makes a classical protein-carbohydrate hydrophobic interaction with the apolar plane formed by C-3, C-4, C-5, and C-6 on the B-face of the galactose. This mode of interaction, which requires an equatorial O-2, an equatorial O-3, and an axial O-4, is consistent with the observed specificity of GH95CBM51 for terminal galacto-configured sugars. However, the galactose O-2 and O-6 are solvent-exposed, suggesting that modifications may be tolerated at these positions. Indeed, the array screening results indicated that this polypeptide can interact with terminal GalNAc (i.e. can tolerate an acetamido modification at C-2) and suggested an ability to accommodate a fucosyl residue α-1,2-linked to galactose. Likewise, the exposure of the O-6 group may also explain the apparent binding of GH95CBM51 to a glycan terminating in sialic acid α-2,6-linked to galactose.

The galactose-binding site of GH95CBM51 shows a reasonable degree of architectural similarity to the binding sites of the galactose-specific family 32 CBMs, the best characterized of which is CpCBM32, the CBM32 from C. perfringens GH84C (NagJ) (3). Although GH95CBM51 and CpCBM32 show no significant amino acid sequence similarity, they share the same fold (root mean square deviation of ∼3.3 Å) and binding site location (Fig. 4C). Many of the protein-carbohydrate interactions are also conserved. Tyr922 of GH95CBM51 is structurally conserved with Trp661 of CpCBM32, which plays a role in binding similar to Tyr922. His955 is conserved with His658, which, like His955, hydrogen bonds with O-4 of the galactose. In CpCBM32, the Nη of Arg690 provides the same hydrogen bond to O-3 of the galactose as the Nε of His1041 in GH95CBM51. Thus, the majority of the GH95CBM51-galactose interactions are structurally well conserved with the family 32 CBMs, perhaps reflecting a simple but prototypical mode of galactose recognition by CBMs.

Structural Basis of Blood Group Antigen Recognition—Ligand-bound forms of GH98CBM51 with blood group A and B carbohydrates were obtained by crystal soaking experiments. In both cases, the electron density for the carbohydrate ligands was quite clear, facilitating accurate modeling of these sugars (Fig. 4, D and E). Only in the case of the 2-acetamido group of the GalNAc in the blood group A antigen trisaccharide was there any ambiguity, as there was no clear electron density for C-8 of this residue. This likely reflected a certain degree of disorder in this portion of the molecule, as this observation was made for the A antigen trisaccharide ligands in the binding sites of both molecules of GH98CBM51 in the asymmetric unit.

The ligand specificity of GH98CBM51 is substantially more restricted in comparison with GH95CBM51, which is revealed by the architecture of its binding site. It is comparatively deep (Fig. 4, D and E), and specific interactions are made between the protein and all three sugars of the blood group carbohydrates (Fig. 4F). The constellation of interactions between GH98CBM51 and these sugars is identical for both blood groups A and B. The plane formed by C-3, C-4, C-5, and C-6 on the B-face of the central galactose residue makes a hydrophobic interaction with Trp192, whereas O-4 of this sugar hydrogen bonds with Oη of Tyr135. Asp56 hydrogen bonds with O-2 and O-3 of the fucose residue, Lys61 with O-4 of this residue, and Thr58 with O-3 of the fucose residue. Specificity for the A/B blood group is provided by hydrogen bonds between Asp195 and the galactose or GalNAc O-4 and between Thr70 and O-3. No direct interactions are evident involving the 2-acetamido group of the blood group A determinant, GalNAc, or involving O-2 of the blood group B determinant, galactose, explaining the general ability of this polypeptide to bind both sugars. However, GH98CBM51 did display a 2-3-fold preference for the blood group A antigen, for which there is no obvious structural explanation (i.e. no differences in direct interactions are evident). However, it appears unlikely that this small preference is biologically relevant.

GH98CBM51 is the only CBM known to be specific for the blood group A/B antigens. Although we did recently determine the structure of a family 47 CBM from Streptococcus pneumoniae in complex with the gluco analog of the blood group A antigen tetrasaccharide, this sugar was not the preferred ligand for this protein (5). In general, structural information regarding the interaction of other carbohydrate-binding proteins with their natural blood group A/B antigens is quite scarce. Only the structures of two lectins and a viral capsid protein have been determined in complex with A and/or B antigen-reactive tri- or tetrasaccharides (28-30). Interestingly, these structures reveal that there is little conservation in the modes of their carbohydrate recognition. The norovirus capsid protein maintains specific interactions between the protein and the terminal Gal/GalNAc that defines the antigen and the fucose residue (29). It makes no interactions with the central galactose residue and does not utilize the classical aromatic amino acid side chain-sugar ring hydrophobic interaction. Winged bean basic agglutinin I maintains the majority of its interactions, including an aromatic side chain hydrophobic interaction, with the terminal Gal/GalNAc of the antigen but also has some polar interactions with the central galactose residue (30). The fucose residue apparently does not participate in the interaction. Similarly, the fungal galectin CGL2 makes specific interactions with the blood group-determining Gal/GalNAc and the central galactose residue, with which a tryptophan side chain makes an apolar interaction (28). The fucose residue is not involved blood group recognition by CGL2. GH98CBM51 is unique, as it makes specific interactions with all of the residues in the blood group A/B antigen trisaccharide (the blood group-determining Gal/GalNAc, galactose, and fucose) (Fig. 4, D-F). The glycan microarray screening did not reveal binding to any fragments of this ligand, indicating that this triumvirate of residues is required for binding. Furthermore, a model of GH98CBM51 with the blood group A antigen tetrasaccharide suggests that the GlcNAc residue of this sugar may even make additional interactions with this polypeptide (data not shown). Although the affinity of GH98CBM51 for the blood group A/B antigens is low, it appears that GH98CBM51 may be one of the most selective carbohydrate-binding proteins for the complete fucosylated forms of these sugars.

Diversity in the CBM51 Family—A structural overlap of GH95CBM51 and GH98CBM51 reveals how the binding sites of these two related CBMs are adapted to recognize their different respective ligands. The GalNAc of the blood group A antigen in the GH98CBM51 binding site approximately overlaps with the galactose in the GH95CBM51 binding site (Fig. 5A). Hydrogen bonding to O-3 and O-4 of the sugar is maintained by residues that are roughly structurally conserved, although the nature of the side chains differs substantially: two histidines in GH95CBM51 versus one aspartic acid and one threonine in GH98CBM51. The tyrosine in GH95CBM51 is structurally replaced by an aspartic acid in GH98CBM51, where instead of making hydrophobic interactions with the GalNAc, it is involved in polar interactions with the fucosyl moiety of the blood group antigen. The remaining four binding residues in GH98CBM51 are unique to this module and appear distinctively positioned to accommodate the blood group antigens.

FIGURE 5.

FIGURE 5.

Comparison of GH95CBM51, GH98CBM51, and other CBM51 members. A, overlay of the GH95CBM51 (yellow) and GH98CBM51 (magenta) carbohydrate-binding sites. Relevant side chains involved in binding and ligands are shown in stick representation, and the metal atoms are shown as spheres. B, phylogenetic analysis of CBM51. The inset shows the complete analysis and indicates the subfamilies. Subfamilies CBM51a and CBM51b (circled in the inset) are expanded with detailed entries. The green star denotes the GH98CBM51 entry, and the blue star denotes the GH95CBM51 entry. C, alignment of subfamilies CBM51a (indicated by the green vertical line) and CBM51b (indicated by the blue vertical line). The entry numbering corresponds to that in B. GH98CBM51 and GH95CBM51 are indicated by stars as in B. The secondary structures for GH98CBM51 and GH95CBM51 are shown above and below the alignment, respectively. Yellow arrows denote β-strands, and the red cylinder represents an α-helix. Residues involved in ligand binding by GH98CBM51 and GH95CBM51 are indicated above and below the alignment, respectively, by arrowheads.

The observed lack of conserved functional residues between these CBMs prompted us to undertake a more comprehensive bioinformatic analysis of the CBM51 family. The CBM51 family currently comprises ∼60 entries from ∼46 different proteins from ∼26 different organisms (different strains of a given species are included in this count). These appear to cluster into six different subfamilies (a-f) (Fig. 5B, inset; see supplemental Figs. 1 and 2 for more detail). The subfamily containing GH98CBM51, here called CBM51a, is distinct from the subfamily containing GH95CBM51, here called CBM51b. In addition to GH98CBM51, CBM51a contains four CBMs with ∼60% amino acid sequence identity to GH98CBM51 that originate from putative S. pneumoniae (serotypes 3 and 8) GH98 enzymes (Fig. 5, B and C). The carbohydrate-binding residues are absolutely conserved among the members of CBM51a (Fig. 5C). Furthermore, the catalytic modules of the pneumococcal enzymes share ∼60% amino acid identity with the catalytic module of C. perfringens GH98, suggesting that these enzymes are also blood group A/B antigen-specific and implying overall that subfamily CBM51a has strict blood group A/B antigen specificity.

Subfamily CBM51b comprises 10 entries from five enzymes from three strains of C. perfringens (Fig. 5B). One additional member originates from a Lentisphaera araneosa protein. These CBMs have ∼50% amino acid sequence identity to the galactose-binding residues identified in GH95CBM51, suggesting that galactose is a primary binding determinant in this subfamily (Fig. 5C).

An amino acid alignment of subfamilies CBM51a and CBM51b failed to identify any amino acid side chains common to recognizing their sugar ligands (Fig. 5C). Diverse ligand specificity among members of a given CBM family is reasonably common (e.g. CBM2, CBM4, CBM6, and CBM13). However, in these cases, there is usually a reasonably high number of conserved amino acid side chains that are key to binding, often aromatic amino acid side chains, with variation in a second shell of amino acid side chains, frequently polar, that help provide the varied binding specificity (3, 31-33). Thus, the degree of divergence between subfamilies CBM51a and CBM51b is somewhat unique among CBMs.

The remaining CBM51 subfamilies are uncharacterized. However, it is notable that the galactose-binding residues identified in GH95CBM51 are conserved among these subfamilies (data not shown), suggesting that they may be related to CBM51b by an ability to bind galactose.

Conclusion—The results presented here confirm that the NPCBMs do indeed constitute a new family of CBMs. Bioinformatic analysis of this family showed that its members cluster into several putative subfamilies. Subfamily CBM51a, which includes GH98CBM51, is highly unique among CBMs in its specificity for the blood group A/B antigens. The only other CBM family that has comparable specificity is CBM47, which contains pneumococcal CBMs with specificity for the Ley antigen. The entries in the other subfamily, CBM51b, which includes GH95CBM51, originate almost entirely from C. perfringens and appear to be galactose-specific. The other putative subfamilies comprise entries from non-pathogenic environmental bacteria and are of unknown binding specificity. Thus, this family of CBMs appears to have roles in the recognition of host glycans by pathogenic bacteria, in the context of either pathogenesis or colonization, and in the recognition of environmental glycans, perhaps in the classical CBM role of plant cell wall degradation.

Supplementary Material

[Supplemental Data]

Acknowledgments

We are grateful to Cores D and H of the Consortium for Functional Glycomics for providing the blood group A and B carbohydrates used in the isothermal titration calorimetry experiments and performing the glycan array experiments, respectively. We thank Professor Yu-Teh Li for providing the C. perfringens ATCC 10543 genomic DNA.

The atomic coordinates and structure factors (codes 2vmg, 2vmh, 2vmi, 2vnr, 2vng, and 2vno) have been deposited in the Protein Data Bank, Research Collaboratory for Structural Bioinformatics, Rutgers University, New Brunswick, NJ (http://www.rcsb.org/).

*

This work was supported in part by a grant from the Canadian Institutes of Health Research. The resources and collaborative efforts provided by the Consortium for Functional Glycomics were supported by NIGMS Grant GM62116 from the National Institutes of Health. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

The on-line version of this article (available at http://www.jbc.org) contains supplemental Figs. 1 and 2.

This article was selected as a Paper of the Week.

Footnotes

3

The abbreviations used are: CBMs, carbohydrate-binding modules; NPCBM, novel putative CBM; GalNAc, d-N-acetylgalactosamine.

References

  • 1.Boraston, A. B., Bolam, D. N., Gilbert, H. J., and Davies, G. J. (2004) Biochem. J. 382 769-782 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Bolam, D. N., Ciruela, A., Mcqueen-Mason, S., Simpson, P., Williamson, M. P., Rixon, J. E., Boraston, A., Hazlewood, G. P., and Gilbert, H. J. (1998) Biochem. J. 331 775-781 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Ficko-Blean, E., and Boraston, A. B. (2006) J. Biol. Chem. 281 37748-37757 [DOI] [PubMed] [Google Scholar]
  • 4.Boraston, A. B., Ficko-Blean, E., and Healey, M. (2007) Biochemistry 46 11352-11360 [DOI] [PubMed] [Google Scholar]
  • 5.Boraston, A. B., Wang, D., and Burke, R. D. (2006) J. Biol. Chem. 281 35263-35271 [DOI] [PubMed] [Google Scholar]
  • 6.Lammerts van Bueren, A., Higgins, M., Wang, D., Burke, R. D., and Boraston, A. B. (2007) Nat. Struct. Mol. Biol. 14 76-84 [DOI] [PubMed] [Google Scholar]
  • 7.Rood, J. I. (1998) Annu. Rev. Microbiol. 52 333-360 [DOI] [PubMed] [Google Scholar]
  • 8.Petit, L., Gibert, M., and Popoff, M. R. (1999) Trends Microbiol. 7 104-110 [DOI] [PubMed] [Google Scholar]
  • 9.Shimizu, T., Ohtani, K., Hirakawa, H., Ohshima, K., Yamashita, A., Shiba, T., Ogasawara, N., Hattori, M., Kuhara, S., and Hayashi, H. (2002) Proc. Natl. Acad. Sci. U. S. A. 99 996-1001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Myers, G. S., Rasko, D. A., Cheung, J. K., Ravel, J., Seshadri, R., DeBoy, R. T., Ren, Q., Varga, J., Awad, M. M., Brinkac, L. M., Daugherty, S. C., Haft, D. H., Dodson, R. J., Madupu, R., Nelson, W. C., Rosovitz, M. J., Sullivan, S. A., Khouri, H., Dimitrov, G. I., Watkins, K. L., Mulligan, S., Benton, J., Radune, D., Fisher, D. J., Atkins, H. S., Hiscox, T., Jost, B. H., Billington, S. J., Songer, J. G., McClane, B. A., Titball, R. W., Rood, J. I., Melville, S. B., and Paulsen, I. T. (2006) Genome Res. 16 1031-1040 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Chitayat, S., Gregg, K., Adams, J. J., Ficko-Blean, E., Bayer, E. A., Boraston, A. B., and Smith, S. P. (2008) J. Mol. Biol. 375 20-28 [DOI] [PubMed] [Google Scholar]
  • 12.Rigden, D. J. (2005) FEBS Lett. 579 5466-5472 [DOI] [PubMed] [Google Scholar]
  • 13.Anderson, K. M., Ashida, H., Maskos, K., Dell, A., Li, S. C., and Li, Y.-T. (2005) J. Biol. Chem. 280 7720-7728 [DOI] [PubMed] [Google Scholar]
  • 14.Wiseman, T., Williston, S., Brandts, J. F., and Lin, L. N. (1989) Anal. Biochem. 179 131-137 [DOI] [PubMed] [Google Scholar]
  • 15.Pflugrath, J. W. (1999) Acta Crystallogr. Sect. D Biol. Crystallogr. 55 1718-1725 [DOI] [PubMed] [Google Scholar]
  • 16.Evans, G., and Bricogne, G. (2002) Acta Crystallogr. Sect. D Biol. Crystallogr. 58 976-991 [DOI] [PubMed] [Google Scholar]
  • 17.Cowtan, K. D., and Zhang, K. Y. (1999) Prog. Biophys. Mol. Biol. 72 245-270 [DOI] [PubMed] [Google Scholar]
  • 18.Perrakis, A., Morris, R., and Lamzin, V. S. (1999) Nat. Struct. Biol. 6 458-463 [DOI] [PubMed] [Google Scholar]
  • 19.Emsley, P., and Cowtan, K. (2004) Acta Crystallogr. Sect. D Biol. Crystallogr. 60 2126-2132 [DOI] [PubMed] [Google Scholar]
  • 20.Murshudov, G. N., Vagin, A. A., and Dodson, E. J. (1997) Acta Crystallogr. Sect. D Biol. Crystallogr. 53 240-255 [DOI] [PubMed] [Google Scholar]
  • 21.Vagin, A., and Teplyakov, A. (2000) Acta Crystallogr. Sect. D Biol. Crystallogr. 56 1622-1624 [DOI] [PubMed] [Google Scholar]
  • 22.Brunger, A. T. (1992) Nature 355 472-475 [DOI] [PubMed] [Google Scholar]
  • 23.Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997) Nucleic Acids Res. 25 3389-3402 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Zdobnov, E. M., and Apweiler, R. (2001) Bioinformatics (Oxf.) 17 847-848 [DOI] [PubMed] [Google Scholar]
  • 25.Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) Nucleic Acids Res. 22 4673-4680 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Choi, J. H., Jung, H. Y., Kim, H. S., and Cho, H. G. (2000) Bioinformatics (Oxf.) 16 1056-1058 [DOI] [PubMed] [Google Scholar]
  • 27.Krissinel, E., and Henrick, K. (2004) Acta Crystallogr. Sect. D Biol. Crystallogr. 60 2256-2268 [DOI] [PubMed] [Google Scholar]
  • 28.Walser, P. J., Haebel, P. W., Kunzler, M., Sargent, D., Kues, U., Aebi, M., and Ban, N. (2004) Structure 12 689-702 [DOI] [PubMed] [Google Scholar]
  • 29.Cao, S., Lou, Z., Tan, M., Chen, Y., Liu, Y., Zhang, Z., Zhang, X. C., Jiang, X., Li, X., and Rao, Z. (2007) J. Virol. 81 5949-5957 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Kulkarni, K. A., Katiyar, S., Surolia, A., Vijayan, M., and Suguna, K. (2007) Proteins 68 762-769 [DOI] [PubMed] [Google Scholar]
  • 31.Boraston, A. B., Nurizzo, D., Notenboom, V., Ducros, V., Rose, D. R., Kilburn, D. G., and Davies, G. J. (2002) J. Mol. Biol. 319 1143-1156 [DOI] [PubMed] [Google Scholar]
  • 32.Henshaw, J., Horne, A., Lammerts van Bueren, A., Money, V. A., Bolam, D. N., Czjzek, M., Ekborg, N. A., Weiner, R. M., Hutcheson, S. W., Davies, G. J., Boraston, A. B., and Gilbert, H. J. (2006) J. Biol. Chem. 281 17099-17107 [DOI] [PubMed] [Google Scholar]
  • 33.Simpson, P. J., Xie, H., Bolam, D. N., Gilbert, H. J., and Williamson, M. P. (2000) J. Biol. Chem. 275 41137-41142 [DOI] [PubMed] [Google Scholar]
  • 34.Lammerts van Bueren, A., and Boraston, A. B. (2004) J. Mol. Biol. 340 869-879 [DOI] [PubMed] [Google Scholar]
  • 35.Read, R. J. (1986) Acta Crystallogr. Sect. A 42 140-149 [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

[Supplemental Data]
jbc_M709865200_1.pdf (1.7MB, pdf)

Articles from The Journal of Biological Chemistry are provided here courtesy of American Society for Biochemistry and Molecular Biology

RESOURCES