Skip to main content
RNA logoLink to RNA
. 2009 Mar;15(3):432–449. doi: 10.1261/rna.1378909

EcI5, a group IIB intron with high retrohoming frequency: DNA target site recognition and use in gene targeting

Fanglei Zhuang 1,2,3, Michael Karberg 1,2,3,4, Jiri Perutka 1,2,3, Alan M Lambowitz 1,2,3
PMCID: PMC2657007  PMID: 19155322

Abstract

We find that group II intron EcI5, a subclass CL/IIB1 intron from an Escherichia coli virulence plasmid, is highly active in retrohoming in E. coli. Both full-length EcI5 and an EcI5-ΔORF intron with the intron-encoded protein expressed separately from the same donor plasmid retrohome into a recipient plasmid target site at substantially higher frequencies than do similarly configured Lactococcus lactis Ll.LtrB introns. A comprehensive view of DNA target site recognition by EcI5 was obtained from selection experiments with donor and recipient plasmid libraries in which different recognition elements were randomized. These experiments suggest that EcI5, like other mobile group II introns, recognizes DNA target sequences by using both the intron-encoded protein and base-pairing of the intron RNA, with the latter involving EBS1, EBS2, and EBS3 sequences characteristic of class IIB introns. The intron-encoded protein appears to recognize a small number of bases flanking those recognized by the intron RNA, but their identity is different than in previously characterized group II introns. A computer algorithm based on the empirically determined DNA recognition rules enabled retargeting of EcI5 to integrate specifically at 10 different sites in the chromosomal lacZ gene at frequencies up to 98% without selection. Our findings provide insight into modes of DNA target site recognition used by mobile group II introns. More generally, they show how the diversity of mobile group II introns can be exploited to provide a large variety of different target specificities and potentially other useful properties for gene targeting.

Keywords: DNA–protein interactions, gene targeting, retrotransposon, reverse transcriptase, ribozyme

INTRODUCTION

Mobile group II introns are a large family of retroelements found in bacterial and organellar genomes (Lambowitz and Zimmerly 2004; Pyle and Lambowitz 2006; Toro et al. 2007). They consist of an autocatalytic intron RNA and a multifunctional intron-encoded protein (IEP), which has reverse transcriptase (RT) activity and acts together with the intron RNA to promote RNA splicing and intron mobility. Three major classes of group II intron RNAs, denoted IIA, IIB, and IIC, as well as subclasses of IIA and IIB introns (IIA1 and IIA2, and IIB1 and IIB2, respectively), have been distinguished by structural characteristics (Michel et al. 1989; Toor et al. 2001; Dai and Zimmerly 2002a; Toro et al. 2007; Simon et al. 2008). The mobile group II intron RNAs have coevolved with their IEPs to form lineages denoted chloroplast-like (CL), mitochondrial-like (ML), and bacterial A-F (Toor et al. 2001; Zimmerly et al. 2001; Toro et al. 2002; Simon et al. 2008). Bacterial A-F introns are found only in eubacteria and archaea (Simon et al. 2008). By contrast, ML and CL group II introns are found both in prokaryotes and in eukaryotes, mainly in mitochondria and chloroplasts, respectively, possibly reflecting their association with bacterial endosymbionts that gave rise to these organelles (Simon et al. 2008). Although several ML group IIA introns have been studied in detail, there have been few similarly detailed studies of group IIB and IIC introns, and consequently the extent of variation in group II intron RNA splicing and mobility mechanisms is incompletely understood.

The major features of group II intron mobility mechanisms were elucidated by studies of three closely related ML group IIA introns: the Saccharomyces cerevisiae mtDNA aI1 and aI2 introns and the Lactococcus lactis Ll.ltrB intron (Zimmerly et al. 1995a,b; Yang et al. 1996; Eskes et al. 1997; Cousineau et al. 1998). These introns encode proteins with four conserved domains denoted RT, X (thumb), DNA-binding (D), and DNA endonuclease (En) (Lambowitz and Zimmerly 2004; Blocker et al. 2005). The IEP assists the splicing of the intron by stabilizing its catalytically active RNA structure and remains bound to the excised intron lariat RNA in a ribonucleoprotein particle (RNP) that promotes intron mobility. RNPs initiate mobility by recognizing DNA target sites, with both the IEP and base-pairing of the intron RNA contributing to the recognition of the DNA target sequence. In the major mobility pathway, retrohoming, the intron RNA reverse splices into a target sequence in a DNA strand, while the IEP uses the C-terminal En domain to cleave the opposite strand. The IEP then uses the cleaved 3′ end as a primer for synthesis of an intron cDNA, which is integrated into the recipient DNA by host cell DNA recombination or repair mechanisms (Eskes et al. 1997, 2000; Cousineau et al. 1998; Smith et al. 2005). The Ll.LtrB intron can also retrohome at lower frequency by an En-independent mechanism in which a nascent strand at a DNA replication fork rather than a cleaved DNA target site is used to prime reverse transcription (Zhong and Lambowitz 2003), and both the En-dependent and En-independent mechanisms are used for retrotransposition of the Ll.LtrB intron to ectopic sites (Coros et al. 2005).

Mobile group IIB introns are diverse, having diverged into at least six distinct lineages (CL and bacteria A, B, D, E, and F) (Simon et al. 2008). Most CL and B introns encode RTs with an En domain, while lineages A, C, D, E, and F introns encode proteins lacking this domain (denoted En group II introns) (Zimmerly et al. 2001; Toro et al. 2007; Simon et al. 2008). One En group IIB intron, the Sinorhizobium melilotti RmInt1 intron, has been shown to retrohome efficiently by using a nascent strand at a DNA replication fork to prime reverse transcription (Martínez-Abarca et al. 2004). Group IIC introns have the smallest intron RNAs, encode proteins lacking an En domain, and rely on En-independent mechanisms for mobility (Robart et al. 2007).

The variations between group II intron classes extend to their mechanisms of DNA target site recognition. The yeast and lactococcal group IIA introns recognize DNA target sequences of 30–35 base pairs (bp), with the central region encompassing the intron-insertion site recognized by base-pairing of the intron RNA and the flanking regions recognized by the IEP (Guo et al. 1997; Mohr et al. 2000; Singh and Lambowitz 2001). In these and other group IIA introns, the RNA sequences that base pair to the DNA target site are located in domain I (DI) and are denoted EBS1, EBS2, and δ, and the complementary DNA target site sequences are denoted IBS1 and IBS2 in the 5′ exon and δ′ in the 3′ exon (Michel and Ferat 1995; EBS and IBS denote exon- and intron-binding site, respectively). Detailed studies of the Ll.LtrB intron suggest that the IEP first recognizes several specific bases upstream of IBS2 in the distal 5′-exon region of the DNA target, and these interactions lead to local DNA melting, enabling the intron RNA to base pair to the DNA target site for reverse splicing (Singh and Lambowitz 2001). Second-strand cleavage occurs after a lag and requires additional IEP interactions with the 3′ exon (Guo et al. 1997; Mohr et al. 2000; Singh and Lambowitz 2001).

Group IIB and IIC intron RNPs likewise recognize relatively long DNA target sites with the central region containing the intron-insertion site recognized by base-pairing of the intron RNA and the flanking regions recognized by the IEP (Jiménez-Zurdo et al. 2003; Robart et al. 2007). In the case of group IIB introns, base-pairing with the 5′ exon involves EBS1/IBS1 and EBS2/IBS2 interactions similar to those of group IIA introns, but EBS2 is located in a DI junctional loop rather than a stem–loop, and the 3′-exon sequence flanking the intron-insertion site (now referred to as IBS3 instead of δ′) is recognized by a sequence element denoted EBS3 in a different DI junctional loop (Costa et al. 2000). Group IIC introns use EBS1/IBS1 and EBS3/IBS3 interactions for recognition of the 5′ and 3′ exons, respectively, but the EBS1/IBS1 interaction is generally shorter than for IIA or IIB introns, and there is no recognizable EBS2/IBS2 interaction. Instead a number of group IIC introns have a stem–loop corresponding to a transcription terminator or attC site upstream of IBS1 (Granlund et al. 2001; Dai and Zimmerly 2002a; Quiroga et al. 2008), and this stem–loop appears to be recognized by the IEP and/or the intron RNA for RNA splicing and DNA integration (Robart et al. 2007).

We showed previously that the group IIA intron Ll.LtrB could be retargeted to insert into different DNA target sequences by modifying the base-pairing sequences of the intron RNA (Guo et al. 2000; Mohr et al. 2000). This feature made it possible to develop Ll.LtrB into a highly efficient bacterial gene targeting vector (“targetron”), which has programmable target specificity. The Ll.LtrB targetron has been used in diverse Gram-negative and Gram-positive bacteria (Karberg et al. 2001; Frazier et al. 2003; Perutka et al. 2004; Chen et al. 2005, 2007; Yao et al. 2006; Heap et al. 2007; Shao et al. 2007; Yao and Lambowitz 2007; Malhotra and Srivastava 2008; Rodriguez et al. 2008; Sayeed et al. 2008), and has the potential to be used similarly for gene targeting in eukaryotes (Guo et al. 2000; Mastroianni et al. 2008). The ability to retarget the Ll.LtrB intron and use it as a gene targeting vector reflects a combination of beneficial properties, including that active intron RNA and IEP can be expressed readily from a donor plasmid, that IEP interactions with the DNA target site are relatively few and flexible, that the base-pairing interactions between the intron RNA and DNA target sequence can be modified, and that these base-pairing interactions are sufficiently long to confer high specificity. Thus far, no other group II intron has been shown to be similarly useful for gene targeting, and it is not clear to what extent the required combination of characteristics will be found among other group II introns.

Here, we studied the group II intron EcI5, which was discovered in Escherichia coli virulence plasmid pO157 (Burland et al. 1998). EcI5 is a CL/IIB1 intron, whose IEP contains an En domain (Fig. 1; Burland et al. 1998; Dai and Zimmerly 2002b). We found that EcI5 is highly active in retrohoming. We analyzed its mechanism of DNA target site recognition and used the information to develop a computer algorithm that enables EcI5 to be retargeted to insert into new sites with high frequency and specificity.

FIGURE 1.

FIGURE 1.

Group II intron EcI5 and its IEP. (A) Predicted secondary structure of EcI5 RNA; data adapted from Dai and Zimmerly (2002b). The structure consists of six double-helical domains (DI–DVI). The 5′- and 3′-splice sites are indicated by arrows. The putative Shine–Dalgarno (SD) sequence, initiation and termination codons of the intron ORF, and sequences involved in tertiary interactions (Greek letters) are boxed. The inset (top right) shows the predicted secondary structure of DIVb of the EcI5-ΔORF intron, which has an MluI site (boxed) added at the site of the ORF deletion. (B) EcI5 IEP. Conserved protein domains shared with other group II IEPs are RT, containing conserved amino acid sequence blocks 1–7 characteristic of the finger and palm regions of retroviral RTs; X, region associated with maturase activity corresponding in part to the RT thumb; D, DNA binding; and En, DNA endonuclease. (0) region conserved in RTs of non-LTR retroelements. Conserved amino acids that are sites of mutations and the boundaries of the ΔD/En and ΔEn truncations analyzed in the present work are shown below.

RESULTS

EcI5 is mobile in E. coli

EcI5 was found to be actively mobile in E. coli in a screen of candidate introns using donor and recipient plasmid vectors that were used previously for mobility assays with Ll.LtrB (Guo et al. 2000; Karberg et al. 2001). The donor plasmid for EcI5 was pACDF-EcI5, a CamR pACYC184-derivative that uses the T7lac promoter to express the full-length intron and flanking exons (Fig. 2A). The recipient plasmid was pBRR-EcI5, an AmpR pBR322-derivative that contains the putative EcI5 target site, corresponding to the ligated-exon sequences flanking the EcI5 insertion site in E. coli virulence plasmid pO157 (Fig. 2C). The EcI5 target site in the recipient plasmid is cloned upstream of a promoterless tetR gene for use in quantitative mobility assays and selections described below. For the initial screening assays, the donor and recipient plasmids were transformed into E. coli HMS174(DE3), which harbors an isopropyl-β-D-thiogalactopyranoside (IPTG)-inducible phage T7 RNA polymerase, and the cells were grown in liquid culture and induced with IPTG. Plasmid DNA was then isolated and integration of the intron into the recipient plasmid target site was assessed by PCR of the 5′-integration junction.

FIGURE 2.

FIGURE 2.

EcI5 intron donor and recipient plasmids used in intron mobility assays. (A) pACDF-EcI5 uses a T7lac promoter (PT7lac) to express full-length EcI5 with flanking 5′- and 3′-exon sequences (E1 and E2, respectively). (B) pACD2-EcI5 expresses the EcI5-ΔORF intron and flanking exons, with a phage T7 promoter (PT7; arrowhead in intron) inserted in DIV, and the IEP expressed from a position downstream of the 3′ exon. SD is the Shine–Dalgarno sequence for the intron ORF. (C) pBRR-EcI5 contains an EcI5 target site (ligated E1 and E2 sequences) cloned upstream of a promoterless tetR gene. Donor plasmids are derivatives of pACYC184 and carry a camR marker, while the recipient plasmid is a derivative of pBR322 and carries an ampR marker. T1 and T2 are E. coli rrnB transcription terminators. T1 terminates both E. coli and phage T7 RNA polymerase, while T2 terminates E. coli but not phage T7 RNA polymerase. Tφ is a phage T7 transcription terminator, which terminates T7 RNA polymerase.

Figure 3A compares the performance of EcI5 in such an assay with that of the full-length Ll.LtrB intron using the same donor and recipient plasmid vectors. For Ll.LtrB, a PCR product corresponding to the 5′-integration junction was detected only after IPTG induction, while for EcI5, a strong band corresponding to the 5′-integration junction was detected with or without IPTG induction, the latter presumably reflecting low-level transcription in the absence of promoter activation. Integration of EcI5 at the correct site was confirmed by sequencing the PCR product (not shown). In controls, PCR of mixed donor and recipient plasmids did not give the integration junction products for either intron (Fig. 3A, “Mix” lanes), confirming that these products result from intron mobility and not template switching during PCR. Additional experiments showed that EcI5 mobility is abolished by the deletion of intron RNA domain V (ΔDV), which is required for the ribozyme activity of the intron RNA (Jarrell et al. 1988), or by a mutation (ATG → ATT) in the initiation codon of the intron ORF, which inhibits IEP expression while minimally affecting intron RNA structure (Fig. 3B, −IEP). The latter findings indicate that EcI5 mobility requires both a catalytically active intron RNA and the IEP, as expected for the retrohoming pathway used by other mobile group II introns.

FIGURE 3.

FIGURE 3.

The native EcI5 intron is mobile in E. coli. (A) PCR-based mobility assays. Donor plasmids expressing the full-length Ll.LtrB and EcI5 introns (pACD-LtrB and pACDF-EcI5, respectively) were transformed into E. coli HMS174(DE3) together with their respective recipient plasmids, pBRR3-ltrB and pBRR-EcI5. After growing the cells at 37°C to early log phase and incubating with 0, 100, or 500 μM IPTG for 2 h to induce intron expression, plasmid DNA was isolated, and integration of the intron into the target site was detected by PCR of the 5′-integration junction, using a primer (RECSEQ01R), which anneals to the recipient plasmid vector backbone, and an intron-specific primer (LtrBAS2.2 and EcI5MOB01 for Ll.LtrB and EcI5, respectively; see Supplemental Table 2 for primer sequences). For each PCR, the right-hand lane (“Mix”) shows a control using the same primers and 100 ng each of the donor and recipient plasmids. The arrows indicate the 603- and 570-bp PCR products corresponding to the 5′-integration junctions of the Ll.LtrB and EcI5 introns, respectively. The larger bands in the “Mix” lanes are due to nonspecific priming on the plasmid DNA templates, which are present at relatively high concentrations, and those at the bottom of the gel are primer dimers. (B) Dependence of EcI5 intron mobility on the catalytic activity of the intron RNA and IEP expression. A mobility assay was carried out as in panel A with pACDF-EcI5 expressing wild-type (WT) EcI5 or mutant introns with a deletion of intron domain V (ΔDV) or a point mutation in the initiation codon of the intron ORF (ATG → ATT; −IEP). The arrow indicates the 570-bp PCR product corresponding to the 5′-integration junction. The lighter bands in the lanes for EcI5 mutants were sequenced and found to be PCR artifacts, presumably due to nonspecific annealing of the primers.

Construction of an EcI5-ΔORF intron and its use in genetic assays of intron mobility

Previous studies showed that the retrohoming frequency of the L. lactis Ll.LtrB intron expressed from a pACD-based donor plasmid could be increased as much as 40,000-fold by deleting intron ORF sequences from DIV and expressing the IEP from a position downstream of the 3′ exon on the same plasmid (Guo 2000; Guo et al. 2000). This dramatically increased retrohoming frequency is thought to be due largely to the decreased nuclease susceptibility of the smaller ΔORF intron in E. coli. With this configuration, mobility frequencies for Ll.LtrB were high enough to carry out selection experiments using libraries of donor and recipient plasmids with randomized recognition elements, enabling determination of detailed DNA target site recognition rules for that intron (Guo et al. 2000; Zhong et al. 2003).

To be able to do similar experiments for EcI5, we constructed the modified intron-donor plasmid pACD2-EcI5, which expresses an EcI5-ΔORF intron and short flanking exons, with the IEP expressed from a position downstream of the 3′ exon (Fig. 2B). The EcI5-ΔORF intron was constructed by replacing intron ORF sequences (nucleotide residues 733–2265) from subdomain IVb with an MluI site (Fig. 1A, inset, upper right). The latter provides a convenient location for cloning cargo genes into the intron. The EcI5-ΔORF intron retains subdomain DIVa, which is a high-affinity binding site for the IEP in group IIA introns (Wank et al. 1999; Huang et al. 2003), as well as the residual DIVb stem and subdomain DIVc, a small stem–loop structure found near the 3′ end of DIV (see Fig. 1). For quantitative mobility assays, a phage T7 promoter was cloned into the MluI site in DIV. We confirmed by RT-PCR that the EcI5-ΔORF intron expressed from the pACD2-EcI5 donor plasmid splices efficiently in E. coli (not shown).

For mobility assays, donor plasmid pACD2-EcI5 and the recipient plasmid pBRR3-EcI5, which contains the EcI5 target site (positions −30 to +15 from the intron-insertion site) cloned upstream of a promoterless tetR gene, were cotransformed into E. coli HMS174(DE3). After induction of intron expression with IPTG, integration of the EcI5-ΔORF intron carrying the phage T7 promoter into the recipient plasmid target site activates the expression of the downstream tetR gene, enabling selection of mobility events by plating on LB agar containing tetracycline and ampicillin. Mobility efficiencies were then calculated as the ratio of TetR + AmpR to AmpR colonies.

As shown in Figure 4, the EcI5-ΔORF intron expressed from pACD2-EcI5 at a low level without IPTG induction had a mobility efficiency of 77% in this assay, about three times higher than a similarly configured Ll.LtrB-ΔORF intron donor plasmid assayed in parallel (24%). When induced with 100 μM IPTG, both the Ll.LtrB-ΔORF and EcI5-ΔORF introns have mobility efficiencies close to 100%. As expected, mutations in the conserved YADD motif in the RT domain of the EcI5 IEP or a C-terminal truncation that deletes both the putative D and En domains (ΔD/En) abolished EcI5 mobility without or with IPTG induction. Further, En-domain mutations (C-terminal truncation ΔEn or mutations H528A and H552A in the putative En active site) decreased EcI5 mobility frequencies without IPTG induction to <0.61% (Fig. 4). Surprisingly, however, these En-domain mutants still gave high mobility when induced with 100 μM IPTG (Fig. 4), presumably by using alternate En-independent retrohoming mechanisms (see Introduction). Together, these findings suggest that EcI5 uses retrohoming pathways similar to those for the Ll.LtrB intron, but with substantially higher integration frequencies, enabling the alternative En-independent pathways to operate at high efficiency.

FIGURE 4.

FIGURE 4.

Mobility assays for wild-type and mutant introns. Donor plasmids (pACD2X for Ll.LtrB and pACD2-EcI5 or derivatives expressing mutant IEPs for EcI5) and recipient plasmids (pBRR3-ltrB for Ll.LtrB and pBRR3-EcI5 for EcI5) were transformed into E. coli HMS174(DE3). The cells were grown at 37°C to early log phase and incubated with 0 or 100 μM IPTG for 1 h. The introns carry a phage T7 promoter in DIV and integrate into a target site cloned in the recipient plasmid upstream of a promoterless tetR gene, thereby activating that gene (see Fig. 2 and Materials and Methods). After induction, cells were plated on LB agar containing ampicillin or ampicillin plus tetracycline, and mobility efficiencies were calculated as the ratio of (TetR+AmpR)/AmpR colonies. The bar graphs show the mean for three determinations, with the error bar indicating the standard deviation.

Overproduction of EcI5 is not toxic to E. coli

In the experiment of Figure 4, we noticed that IPTG induction of EcI5 expression in the presence of the EcI5 recipient plasmid is ∼100-fold more toxic to E. coli than is induction of the Ll.LtrB intron in the presence of its recipient plasmid (Supplemental Fig. 1A). Because we wished to use EcI5 for gene targeting, we were concerned initially that this toxicity might reflect a high frequency of ectopic integration or indiscriminate cleavage of E. coli chromosomal DNA by the overexpressed intron. Additional controls, however, showed that elevated toxicity is observed only when EcI5 is expressed in the presence of its own recipient plasmid and that EcI5 expressed by itself or with the Ll.LtrB recipient plasmid is not significantly more toxic than is expression of the Ll.LtrB intron or the benign protein GFP from the same donor plasmid pAC-GFP (Zhao and Lambowitz 2005; Supplemental Fig. 1B). A likely explanation is that the integration frequency of EcI5 into its recipient plasmid target site is so high that multiple plasmids are targeted in each cell, leading to overexpression of the tetR gene, which is known to be toxic (Eckert and Beck 1989). Thus, the experiment of Figure 4 may significantly underestimate the mobility frequency difference between EcI5 and Ll.LtrB.

Identification of critical nucleotide residues in the distal 5′-exon and 3′-exon regions of the EcI5 DNA target site

To determine DNA target site recognition rules for the EcI5 intron, we used the plasmid-based TetR selection assay to carry out experiments with donor and recipient plasmid libraries in which different recognition elements were randomized. As an overview for these experiments, Figure 5 (top) shows the DNA target sequence for the EcI5 intron and its predicted base-pairing interactions with the intron RNA. The EBS2, EBS1, and EBS3 sequences, each located in a different region of DI, potentially base pair to DNA target site sequences IBS2, IBS1, and IBS3, respectively, spanning positions −13 to +1 from the intron-insertion site. By analogy with the Ll.LtrB intron, the IEP is expected to recognize DNA target site positions both upstream of and downstream from those that base pair with the intron RNA.

FIGURE 5.

FIGURE 5.

Identification of critical nucleotide residues in the distal 5′- and 3′-exon regions of the EcI5 DNA target site. E. coli HMS174(DE3) containing the wild-type intron-donor plasmid pACD2-EcI5 and a recipient plasmid library with random nucleotide residues at positions −35 to −14 and +2 to +20 from the intron-insertion site was grown at 37°C to early-log phase and induced with 100 μM IPTG for 1 h. The cells were then plated on LB agar containing tetracycline and ampicillin to select those in which the EcI5-ΔORF intron carrying a phage T7 promoter integrated into functional target sequences upstream of the promoterless tet R gene in the recipient plasmid. The plasmids were isolated by a mini-prep procedure and sequenced, using primer ForpBRR for the 5′-integration junction and Rev2pBRR for the 3′-integration junction (Supplemental Table 2). The top shows the EcI5 DNA target sequence from positions −35 to +20 and its predicted EBS2/IBS2, EBS1/IBS1, and EBS3/IBS3 base-pairing interactions with the intron RNA. The arrow indicates the intron-insertion site. The WebLogo representation (Crooks et al. 2004) depicts nucleotide frequencies at each randomized position in 101 selected target sites, corrected for biases in the initial pool based on sequences of 108 unselected recipient plasmids, as described in Materials and Methods. Nucleotide frequencies (percent) at each position in the selected and unselected plasmids are summarized below. In some cases, percentage totals do not equal 100 due to rounding off.

To identify nucleotide residues potentially recognized by the IEP, we carried out a selection experiment in which the wild-type EcI5-ΔORF intron integrates into recipient plasmid target sites with randomized nucleotide residues from positions −35 to −14 and +2 to +20 of the target sequence. After plating on LB agar containing tetracycline and ampicillin, plasmids containing integrated introns were isolated by a miniprep procedure, and the sequences of active target sites were determined by sequencing the 5′- and 3′-integration junctions. In the experiment of Figure 5, we determined target sequences for 101 active target sites selected in the experiment and 108 plasmids from the initial pool to correct for nucleotide frequency biases, as described in Materials and Methods. Figure 5 (bottom) summarizes the results in WebLogo format, along with nucleotide frequencies at each randomized position in both the selected target sites and the initial pool.

In the distal 5′-exon region, the most strongly conserved nucleotide residues in the target sequence were C-18, C-17, A-15, and A-14, with A-14 found in 100% of the active target sites, and in the 3′ exon, the most strongly conserved nucleotide residue was T+5, which was found in 80% of active target sites. Several other 3′-exon positions in the EcI5 target site showed some selection for specific nucleotide residues (+2, +3, +4, +6, and +10). In agreement with the selection experiment, mobility assays showed that single nucleotide substitutions at A-14, which was found in 100% of active target sites, strongly decreased mobility efficiencies (A-14C, A-14G, and A-14T at 1.0%, 1.4%, and 1.6% of wild type, respectively; average of two assays done as in Fig. 4 without IPTG induction). As expected, single nucleotide substitutions at other key positions identified in the selections also inhibited mobility but less severely than mutations at A-14 (C-18T, A-15C, and T+5C at 8%, 2%, and 20% of wild type, respectively, without IPTG induction).

In addition to identifying nucleotide residues potentially recognized by the IEP, the selection data in Figure 5 suggest that the EBS3/IBS3 interaction is limited to position +1, because positions +2 and +3 do not show strong selection for nucleotide residues that could base pair with U residues in the intron RNA to extend this interaction. DNA target site position +2 does show weak selection for a non-wild-type A residue that could pair with the U residue at EBS3 + 2, but also shows similar weak selection for a non-wild-type T residue that cannot base pair with this U residue. These weak selections may reflect that a weak A-T or T-A DNA base pair at target site position +2 helps accommodate the EBS3/IBS3 interaction between the intron RNA and DNA target site at position +1.

The selected target sites have GC contents ranging from 18% to 77% from position −35 to −14 and 21%–74% from position +2 to +20, indicating that EcI5 can integrate into DNA target sites having a wide range of melting temperatures (data not shown). Analysis of 30 selected target sites using DNA Mfold (http://mfold.bioinfo.rpi.edu/cgi-bin/dna-form1.cgi) did not reveal any conserved stem–loop structures in the EcI5 DNA target site. Using a chi-square test with probability level P = 0.001 and nine degrees of freedom, we found two potential covariations between nucleotide residues in the DNA target sequence (positions −24 and −18; χ2 = 30.3 and positions +9 and +11; χ2 = 28.8). However, both are based on dinucleotides with low (<10) observed counts, and their significance in this relatively small data set is unclear.

Rules for the EBS1/IBS1 pairing

The predicted EBS1/IBS1 interaction between wild-type EcI5 and its DNA target site encompasses positions −5 to −1 from the intron-insertion site (Fig. 5). To determine rules for base-pairing between EBS1 and IBS1, we carried out a selection experiment in which positions −7 to −1 in the EBS1 stem–loop in the intron and the corresponding positions in the IBS1 region of the DNA target site were both randomized (Fig. 6A). Because the EBS1/IBS1 pairing is also required for RNA splicing, IBS1 positions −7 to −1 in the 5′ exon of the donor plasmid were also randomized to provide complementary nucleotide combinations in the pool (Guo et al. 2000). After selection, the 5′-integration junctions were amplified from the selected TetR+AmpR colonies by colony PCR, and the PCR products were sequenced to identify active EBS1/IBS1 combinations. We obtained sequences for 87 independent intron-integration events, along with 94 unselected donor plasmids and 95 unselected recipient plasmids to correct for nucleotide frequency biases in the initial pools. Figure 6B summarizes nucleotide frequencies at each of the randomized positions, with the results depicted in WebLogo format above, while Figure 6C,D, and Table 1 summarize base-pair frequencies between intron RNA and DNA target site positions.

FIGURE 6.

FIGURE 6.

Analysis of EcI5 DNA target site EBS1/IBS1 base-pairing interactions. A selection experiment was done using a pBRR3-EcI5-based recipient-plasmid library with random nucleotide residues at DNA target site positions −7 to −1 and a pACD2-EcI5 based donor-plasmid library with random nucleotide residues at the corresponding positions in the EBS1 stem–loop. The IBS1 positions in the 5′-exon of the donor plasmid were also randomized to provide complementary nucleotide combinations for RNA splicing. The selection was done as in Figure 5, except that IPTG induction of the donor plasmid library, which decreased cell viability, was replaced with a 1-h incubation at 37°C in the absence of IPTG. After plating the cells on LB agar containing tetracycline, ampicillin, and chloramphenicol, colonies containing recipient plasmids with integrated introns were picked, and the region from 5′-exon position −210 to intron position +505 was amplified by colony PCR using primers ForpBRR and EcI5AVA2AS and sequenced using primer Rseq (see Supplemental Table 2). (A) Predicted EBS1/IBS1 base-pairing interactions between the intron RNA and the top strand of the DNA target site. The nucleotide residues randomized in the selection are highlighted in gray. The arrow indicates the intron-insertion site. (B) WebLogo representation of nucleotide frequencies at randomized positions in 87 selected integration events, corrected for biases in the initial pools based on sequences of 94 unselected donor and 95 recipient plasmids from the initial pools. Nucleotide frequencies (percent) at each position in the selected and unselected plasmids are summarized below. In some cases, percentage totals do not equal 100 due to rounding off. (C) Percentage of base pairs at each EBS1/IBS1 position in 87 selected integration events (black bars) and 94 randomly paired donor and recipient plasmids from the initial pools (white bars). (D) Percentage of introns that have different numbers of EBS/IBS1 base pairs with the target site in 87 selected integration events (black bars) and 94 randomly paired donor and recipient plasmids from the initial pools (white bars). In C and D, both Watson–Crick and wobble U•G and G•T pairs are counted as base pairs (Guo et al. 1997; Sugimoto et al. 2000).

TABLE 1.

Observed base-pair frequencies in selection experiments for the EBS1/IBS1, EBS2/IBS2, and EBS3/IBS3 interactions between the EcI5 intron RNA and its DNA target site

graphic file with name 432tbl1.jpg

The results show selection for base pairing between the intron RNA and DNA target site from positions −6 to −1, with no selection for base pairing at position −7 (Fig. 6C; Table 1). The selection for base-pairing at position −6 was unexpected because wild-type EcI5 cannot form a canonical Watson–Crick or wobble base pair with its DNA target site at this position. Ninety-five percent of the selected intron/target site combinations have four or more base pairs in EBS1/IBS1, compared to only 21% for randomly paired introns and target-site combinations from the original pools (Fig. 6D).

In the intron RNA, EBS1 positions −6 to −1, which base pair with the DNA target site, show some selection for wild-type nucleotide residues, with the strongest selection for C-6, G-5, and U-2 (Fig. 6B). We also see weaker selection for complementary nucleotide residues in IBS1, leading to some preferences for specific base pairs between the intron RNA and DNA target site (e.g., C-G at position −6, G-C or G•T at position −5, and U-A at position −2; also observed when data are corrected for nucleotide frequency biases) (Table 1; Supplemental Table 1). The most likely explanation is that the selection for specific EBS1 nucleotide residues reflects constraints on the structure of the EBS1 RNA stem–loop, which in turn leads to selection for complementary nucleotide residues in IBS1. We cannot exclude, however, that the weak selection for some IBS1 nucleotide residues also reflects constraints on DNA structure or partial recognition of IBS1 sequences by the IEP. We also note that some EBS1/IBS1 positions show selection for non-wild-type base pairs (e.g., U-A or C-G instead of U•G at position −4 and C-G instead of U-A at position −3) (Table 1; Supplemental Table 1). Notably, EBS1 position −7, which does not base pair to the DNA target site, nevertheless shows strong selection for the wild-type U residue, almost certainly reflecting a constraint on the structure of the RNA stem–loop, as posited for the other EBS1 nucleotide residues (Fig. 6B).

Rules for the EBS2/IBS2 pairing

Figure 7 shows the results of similar selection experiment for positions −13 to −7 potentially involved in the EBS2/IBS2 interaction. In this experiment, we obtained sequences for 102 independent integration events, along with 99 unselected donor plasmids and 103 unselected recipient plasmids to correct for nucleotide frequency biases in the pools. The data show strong selection for RNA/DNA base-pairing between positions −13 and −9, with selection against RNA/DNA base-pairing at position −8 and no selection for or against RNA/DNA base-pairing at position −7 (Fig. 7C,D).

FIGURE 7.

FIGURE 7.

Analysis of EcI5 DNA target site EBS2/IBS2 base-pairing interactions by in vivo selection. A selection experiment was done as in Figure 6, using a pBRR3-EcI5-based recipient plasmid library with random nucleotide residues at DNA target site positions −13 to −7 and a pACD2-based donor plasmid library with random nucleotide residues at the corresponding positions in the EBS2 loop. The IBS2 positions in the 5′ exon of the donor plasmid were also randomized to provide complementary nucleotide combinations for RNA splicing. Colonies were selected and plasmid sequences determined via colony PCR, as described in Figure 6. (A) Predicted EBS2/IBS2 base-pairing interactions between the intron RNA and the top strand of the DNA target site. The nucleotide residues randomized in the selection are highlighted in gray. The arrow indicates the intron-insertion site. (B) WebLogo representation of nucleotide frequencies at randomized positions in 102 selected integration events, corrected for biases in the initial pools based on sequences of 99 unselected donor plasmids and 103 unselected recipient plasmids. Nucleotide frequencies (percent) at each position in the selected and unselected plasmids are summarized below. In some cases, percentage totals do not equal 100 due to rounding off. (C) Percentage of base pairs at each EBS2/IBS2 position in 102 selected integration events (black bars) and 99 randomly paired donor and recipient plasmids from the initial pools (white bars). (D) Percentage of introns that have different numbers of EBS2/IBS2 base pairs in 102 selected integration events (black bars) and 99 randomly paired donor and recipient plasmids from the initial pools (white bars). In C and D, both Watson–Crick and wobble U•G and G•T pairs are counted as base pairs.

Notably, the selection for base-pairing in EBS2/IBS2 appears stronger than in EBS1/IBS1, with each position between −13 and −9 base paired in ≥ 95% of the selected combinations and positions −10 and −9 base paired in 100% and 99% of the selected combinations, respectively (Fig. 7C). Additionally, 91% of the selected combinations have all five base pairs between positions −13 and −9, while randomly chosen intron-target site combinations from the original pool, typically, have three or fewer base pairs between these positions (Fig. 7D).

Within the region of EBS2 involved in base-pairing (positions −13 to −9), positions −11 to −9 show moderate to high selection for specific nucleotide residues (Fig. 7B). As in the case of EBS1, the most likely possibility is that the selection for these nucleotide residues reflects constraints on RNA structure, and these constraints in turn result in preferences for specific RNA–DNA base pairs at these positions (Table 1; Supplemental Table 1). Positions −11 and −10 show particularly strong selection for the wild-type C-G pairs, but some selection for C-G base pairs is also seen at positions −12 and −9, where the wild-type base pair is in both cases A-T. In contrast to the EBS1/IBS1 interaction, when the data are corrected for nucleotide frequency biases in the initial pools, the wild-type EBS2/IBS2 base pair appears to be preferred at each position (Supplemental Table 1).

Intron RNA positions −8 and −7, which are not involved in base-pairing with the DNA target site, nevertheless show strong selection for the wild-type A and U residues, respectively, again likely reflecting constraints on intron RNA structure (Fig. 7C). Position −7 in the DNA target site shows no selection for a specific nucleotide residue, in agreement with the results in the EBS1/IBS1 selection above, while position −8 in the DNA target site shows some selection for G or A residues, possibly reflecting prohibition against base-pairing with the strongly conserved A residue at position −8 in the intron RNA.

Rules for the EBS3/IBS3 pairing

The EBS3/IBS3 interaction in EcI5 is predicted to involve a single base pair at position +1 (Fig. 5). To confirm this prediction and assess nucleotide preferences for the EBS3/IBS3 interaction, we did a final selection experiment in which we randomized position +1 both in IBS3 of the recipient plasmid and EBS3 of the donor plasmid (Fig. 8). The results show selection for RNA/DNA base-pairing, with 71% of the selected target sites having either a Watson–Crick or wobble G•T or U•G pair at position +1, compared to only 22% in the pool (Fig. 8B). Notably, although all four Watson–Crick combinations and some mismatches (RNA/DNA C/A and G/A) are reasonably well represented, some nucleotide mismatches (A/C, U/C, A/G, G/G, C/T, and U/T) were not found, even though a relatively large number of events were analyzed for a selection involving a single base pair (Table 1). These findings raise the possibility that the EBS3/IBS3 interaction may be particularly sensitive to some mismatches.

FIGURE 8.

FIGURE 8.

Analysis of EcI5 DNA target site EBS3/IBS3 base-pairing interactions by in vivo selection. A selection experiment was done as in Figure 6 with pBRR3-EcI5 randomized at IBS3 position +1 and pACD2-EcI5 randomized at EBS3 position +1. After selecting colonies containing recipient plasmids with integrated introns, the region extending from 75 bp upstream of EBS3 to 500 bp downstream from the inserted intron was amplified by colony PCR using primers EcI5 279s and Rev2-pBRR and sequenced using primer EcI5 297s (see Supplemental Table 2). (A) WebLogo representation of nucleotide frequencies in 79 selected integration events, corrected for biases in the initial pools based on sequences of 59 unselected donor plasmids and 59 unselected recipient plasmids. Nucleotide frequencies (percent) at the randomized position are summarized below. In some cases, percentage totals do not equal 100 due to rounding off. (B) Percentage of introns having Watson–Crick or wobble U•G and G•T base pairs at EBS3/IBS3 in the 79 selected integration events and 59 randomly paired donor and recipient plasmids from the initial pools.

We note that for this selection, position +1 in the 3′ exon of the donor plasmid was not randomized, and the donor plasmid retained the wild-type C-residue at this position. For the Ll.LtrB intron, the analogous δ–δ′ interaction between the intron and 3′ exon in the precursor RNA may contribute to the efficiency of RNA splicing, but generally has only a small effect on the overall intron-integration efficiency (Perutka et al. 2004). Our results suggest that this is also the case for the EBS3/IBS3 pairing in EcI5, with about half of the selected introns, those with A, C, or T residues at EBS3 + 1, unable to form an EBS3/IBS3 base pair in the precursor RNA. Nevertheless, selection for an EBS3/IBS3 base pair in the precursor RNA could contribute along with constraints on RNA structure to the observed bias for introns with the wild-type G residue at EBS3 + 1.

A computer algorithm for identifying EcI5-insertion sites and retargeting EcI5

We used the DNA target site recognition rules revealed by the selection experiments to develop a computer algorithm that identifies potential EcI5 insertion sites and designs PCR primers for modification of the intron's EBS sequences to insert into those sites. As done for the Ll.LtrB intron (Perutka et al. 2004), we used a simple probabilistic model (zero-order Markov model), which assumes that the likelihood of occurrence of a nucleotide residue in the target sequence is independent of other residues in this sequence. Model parameters are derived from the observed frequency of residues in a training set of trusted examples. Thus, the performance of the model critically depends on which target site positions are included in the calculation. Because the model is based on a limited amount of data, overtraining can occur, resulting in a model that is able to classify sequences from the training set correctly, but unable to make accurate predictions for sequences not drawn from the original training set. The more parameters that are included in the model, the greater the risk of overtraining.

By a procedure described below, we found that a minimal set of DNA target site positions sufficient for discriminating efficient target sites from inefficient ones includes −26 to −14, −8, −6, and +2 to +10, with A-14, which is present in 100% of the selected target sites, treated as a fixed position in the model. Positions −26 to −14 and +2 to +10 are the regions presumably recognized by the IEP. Positions −6 and −8, which are not thought to be recognized by the IEP, nevertheless have moderately high information content in the EBS1/IBS1 selection of Figure 6 and the EBS2/IBS2 selection of Figure 7, respectively. For position −6, this information content reflects selection for a nucleotide residue that extends the RNA/DNA base-pairing interaction, while for position −8, it reflects selection against a nucleotide residue that extends the RNA/DNA base-pairing interaction.

The algorithm scores potential DNA target sites across a 36-bp sliding window with 1-bp increments by calculating a log-odds score S, which compares the probability P(seq|M) that the target sequence seq was generated by the model M to the probability P(seq) that the sequence was generated by chance (null model). P(seq|M) and P(seq) are calculated as the products of individual probabilities P(np|M) and P(np) that a nucleotide n at position p in the target sequence was generated by the model and by chance, respectively, with the latter calculated from the frequencies Inline graphic and Inline graphic for a nucleotide residue in selected target sites and the initial pool, respectively, according to the equation

graphic file with name 432equ1.jpg

A positive score indicates that the model predicts the potential EcI5 target sequence better than the null model (for additional description, see Perutka et al. 2004).

Retargeting of EcI5 to insert into sites in the E. coli lacZ gene

To test the performance of different models, we used them in the algorithm to identify potential target sites in the E. coli lacZ gene and then retargeted EcI5 to insert into those sites by modifying its EBS1, EBS2, and EBS3 sequences to form Watson–Crick base pairs at the positions recognized by base-pairing of the intron RNA (IBS2, −13 to −9; IBS1, −6 or −5 to −1; and IBS3, +1). IBS2, IBS1, and IBS3 in the 5′ and 3′ exons of the donor plasmid were also modified to be complementary to the retargeted EBS2, EBS1, and EBS3 sequences for efficient RNA splicing.

Figure 9 shows potential target sequences in the lacZ gene and their base-pairing interactions with the intron RNA, along with their log-odds scores computed by the algorithm for the model described in the preceding section. To facilitate retargeting, we constructed four different donor plasmids, pACD3-EcI5A, -EcI5C, -EcI5G, and -EcI5T, which lack the T7 promoter in DIV and have the indicated nucleotide residues at EBS3 along with the complementary nucleotide residue at IBS3 for maximally efficient RNA splicing. For retargeting, we selected the appropriate donor plasmid whose EBS3 nucleotide residue is complementary to IBS3 in the DNA target site and introduced the required modifications into the donor plasmid's EBS1, EBS2, IBS1, and IBS2 sequences by PCR, as diagrammed in Figure 10A.

FIGURE 9.

FIGURE 9.

DNA target site sequences and base-pairing interactions for EcI5 introns retargeted to insert at different sites in the E. coli lacZ gene. The retargeted EcI5 introns (targetrons) are denoted by a number that corresponds to the nucleotide position 5′ to the intron-insertion site in the target gene's coding sequence, followed by “a” or “s” indicating antisense or sense strand, respectively. DNA target sequences in the lacZ gene are shown from positions −26 to +10 from the intron-insertion site, with nucleotide residues that match those in the wild-type (WT) EcI5 target site highlighted in gray. The arrow indicates the intron-insertion site. Log-odds scores determined using the computer algorithm described in the text are indicated to the right, along with the targeting frequency determined by plating on LB agar containing X-gal and counting blue and white colonies. Targeting frequencies were determined for targetrons having 5- or 6-bp EBS1/IBS1 interactions. The values are mean±standard deviation for three determinations. (−) not determined; (n.d.) not detectable.

FIGURE 10.

FIGURE 10.

Retargeting of EcI5 to insert into different sites in the E. coli lacZ gene. E. coli HMS174(DE3) was transformed with pACD3-EcI5 expressing the retargeted introns, grown at 37°C to early-log phase, and induced with 100 μM IPTG for 3 h unless noted otherwise. The cells were then plated on LB agar containing X-gal (40 mg/L), and the lacZ targeting frequency (Fig. 9) was determined by counting blue and white colonies. (A) Two-step PCR used to retarget the EcI5-ΔORF intron by modification of EBS and IBS sequences in the donor plasmid. The donor plasmid used as template was pACD3-EcI5A, -EcI5C, -EcI5G, or -EcI5T according to the desired EBS3 nucleotide residue. P1–P4 are primers used to modify the intron's EBS1 and EBS2 to be complementary to IBS1 and IBS2 in the DNA target site, and IBS1 and IBS2 in the 5′ exon of the donor plasmid to be complementary to the retargeted EBS1 and EBS2 for efficient RNA splicing (see Materials and Methods). The final PCR product containing the modified sequences was digested with XbaI and AvaII and swapped for the corresponding fragment of the same donor plasmid. (B) Colony PCR of E. coli lacZ disruptants obtained using retargeted EcI5 introns. Colony PCR was done with primers LacZP3 and LacZP4 flanking the intron-insertion site in the lacZ gene (see Supplemental Table 2). The figure shows representative data for three of the retargeted introns. “−colony” is a parallel PCR without a colony, and WT is a parallel PCR done on a colony of wild-type HMS174(DE3). (C) Southern hybridizations. Genomic DNA was isolated from E. coli HMS174(DE3) (WT) and the indicated lacZ disruptants grown under nonselective conditions. The DNA was digested with BglI, run in a 0.8% agarose gel, blotted to a nylon membrane, and hybridized with a 32P-labeled intron probe (see Materials and Methods). The donor plasmid pACD2-EcI5 digested with BglI was run in a parallel lane. The numbers to the left of the gel indicate the positions of size markers (1-kb plus ladder; Invitrogen). The schematics to the right depict the BglI fragments of the lacZ gene containing the inserted EcI5 intron. The insertion of targetron LacZ1257s, LacZ1790a, or LacZ1806s results in a 3.0-kb BglI fragment. The insertion of targetron LacZ163s disrupts the BglI sites at position 164, resulting in a larger fragment (4.7 kb). The LacZ1709a disruptant shows an additional band due to residual donor plasmid, which was retained in this disruptant but lost in the other disruptants during growth under nonselective conditions.

The donor plasmids containing the retargeted introns were transformed into E. coli HMS174(DE3). After induction of intron expression with 100 μM IPTG for 3 h at 37°C, disruption of the lacZ gene was scored by plating on LB agar containing 5-bromo-4-chloro-3-indolyl-D-galactopyranoside (X-gal), where LacZ+ colonies are blue and LacZ colonies are white. For each retargeted intron, we scored 100 to 1000 colonies depending on the targeting frequency and confirmed integration at the desired site by colony PCR and sequencing both the 5′- and 3′-integration junctions for at least 12 white colonies (Fig. 10B).

Comparison of the experimentally determined targeting frequencies with the log-odds scores computed by the algorithm for the model described in the preceding section showed that all efficient target sites (targeting frequencies 30%–98%) have log-odds score higher than 8.2, while all less efficient target sites (targeting frequencies 0% to 4.0%) have log-odds scores less than 8.2 (Fig. 9). More complicated models that included additional DNA target site positions or frequencies of EBS/IBS base pairs from the selection experiments did not give a similar clean separation of efficient and inefficient target sites. This situation likely reflects that the currently available selection data are too limited to accurately weight the effects of substituting different combinations of nucleotide residues in the EBS sequences of the intron RNA. We anticipate that the model will improve as more data become available. We also note that as for Ll.LtrB targetrons, some retargeted EcI5 introns with lower log-odds scores still gave experimentally useful targeting frequencies.

All of the introns in Figure 9 were tested with a 5-bp EBS1/IBS1 interaction like that of the wild-type EcI5 with its DNA target site. Because the selection experiment of Figure 6 showed the potential for a sixth base pair at EBS1/IBS1 position −6, we compared the insertion frequencies of a number of retargeted introns with 5- or 6-bp EBS1/IBS1 DNA target site interactions. In four cases (1806s, 178a, 1790a, 187s), the additional EBS1/IBS1 base pair gave the same or increased targeting frequencies, while in four other cases (912s, 1257s, 326a, 1878s), it decreased the targeting frequency significantly (Fig. 9). These findings may reflect preferences for specific nucleotide residues at position −6 in the context of different EBS1 sequences. Until these effects are understood better, it seems preferable for retargeting EcI5 to leave the wild-type nucleotide residue at EBS1 position −6 and use the 5-bp EBS1/IBS1 interaction like that of wild-type EcI5 with its DNA target site.

We also tested whether EcI5 could be expressed from the broad-host range vector pBL1, which can be used in diverse Gram-negative bacteria without introducing a gene encoding phage T7 RNA polymerase (Yao and Lambowitz 2007). This plasmid expresses targetrons by using an m-toluic acid-inducible promoter (P m) recognized by the host RNA polymerase. For EcI5 targetron LacZ1806s, pBL1-EcI5 gave a lacZ targeting frequency of only 4% compared to 98%–99% for the same targetron expressed from pACD3-EcI5A. In previous work, pBL1 gave high insertion frequencies with several Ll.LtrB targetrons in different bacteria and appeared to be as efficient as pACD3 for expressing Ll.LtrB targetron LacZ635s in E. coli HMS174(DE3) (Yao and Lambowitz 2007). However, additional tests comparing other Ll.LtrB LacZ targetrons in E. coli HMS174(DE3) showed that some targetrons function as efficiently when expressed from pBL1 as from pACD3, while others function less efficiently (J. Yao and A.M. Lambowitz, unpubl.). A possible explanation is that targetrons with lower splicing efficiencies due to different EBS/IBS interactions in the precursor RNA benefit from more efficient expression by T7 RNA polymerase.

For four of the retargeted EcI5 introns expressed from pACD3-EcI5 in the experiment of Figures 9 and 10, genomic DNA from the disruptants was analyzed by Southern hybridization to confirm site-specific insertion. For three disruptants, 163s, 1806s, and 1790a, the Southern blots hybridized with an intron probe showed a band of the size expected for site-specific insertion into lacZ gene and no nonspecific insertions (the extra band for 1790a is due to residual donor plasmid) (Fig. 10C). In the remaining case, 1257s, the disruptants obtained after induction with 100 μM IPTG for 3 h showed insertion at a second site in addition to the expected site in lacZ (not shown), but this ectopic targeting was not evident when the induction was done with a lower amount of IPTG for a shorter time (50 μM IPTG, 1 h) (Fig. 10C). Together, the above findings show that EcI5, like Ll.LtrB, can be retargeted to insert efficiently and specifically into desired chromosomal DNA targets.

DISCUSSION

We find that group II intron EcI5, a subclass CL/IIB1 intron discovered in an E. coli virulence plasmid (Burland et al. 1998), is highly active in retrohoming in E. coli, and we adapted it for use in gene targeting. Like the well-studied group IIA introns S. cerevisiae aI1 and aI2 and L. lactis Ll.LtrB, EcI5 encodes a protein with conserved RT, X, D, and En domains. As expected for the retrohoming mechanism used by other group II introns, we find that EcI5 retrohoming is abolished by mutations that inhibit IEP expression or the ribozyme activity of the intron RNA, or by the mutation YADD → YAAA in the RT active site. Further, as shown for the Ll.LtrB intron (Zhong and Lambowitz 2003), En-domain mutations strongly inhibit EcI5 retrohoming, but leave residual retrohoming by En-independent mechanisms. It is a measure of the high activity of EcI5 that such En-independent retrohoming can occur at almost 100% efficiency when intron expression is induced with IPTG.

Both full-length EcI5 and a streamlined EcI5-ΔORF derivative with the IEP expressed from the same donor plasmid have higher mobility frequencies than do the corresponding Ll.LtrB intron constructs. These higher mobility frequencies could reflect more efficient production or higher stability of the intron RNA or IEP, more active RNPs, or the ability to efficiently use both En-dependent and En-independent retrohoming mechanisms. Biochemical analysis will be needed to determine whether EcI5 RNPs have inherently higher DNA integration efficiency than do Ll.LtrB RNPs. As for Ll.LtrB (Guo et al. 2000), the deletion of ORF sequences from DIV enables EcI5 to retrohome at near 100% efficiency, presumably by decreasing the nuclease susceptibility of the intron RNA, which appears to be a major factor limiting the mobility of group II introns in E. coli (Guo et al. 2000; Smith et al. 2005; Coros et al. 2008).

EcI5 was found in virulence plasmid pO157 in E. coli strain O157:H7 (Burland et al. 1998; Dai and Zimmerly 2002b). The intron is inserted within a noncoding region of the plasmid, but in the same orientation as most other plasmid genes. Southern hybridizations and PCR analysis showed that EcI5 is also inserted at the same site in DNAs isolated from 19 of 72 ECOR strains, although always in fragmented form (Dai and Zimmerly 2002b). By using the computer algorithm developed here for identifying EcI5 target sites, we found no efficient target sites for the wild-type intron (log-odds score > 8.2 and Watson–Crick or wobble base pairs at all EBS/IBS positions) in sequenced E. coli O157 genomes (EC4115, EDL933, and Sakai), and only one such target site in the nonessential gene cusS in the genome of E. coli K12 MG1655, which does not carry the plasmid. Thus, the spread of EcI5 to the E. coli chromosome may be limited by a combination of poor expression from a noncoding region of pO157 and a paucity of efficient insertion sites in the genome.

Like other mobile group II introns, EcI5 recognizes DNA target sequences by using both the IEP and base-pairing of the intron RNA. As expected for a class IIB intron, the base-pairing interactions involve EBS1 and EBS2, which base pair to the 5′ exon, and EBS3, which base pairs to the 3′ exon (Costa et al. 2000; Jiménez-Zurdo et al. 2003). Detailed analysis by selection experiments using randomized sequence libraries showed that the EBS1/IBS1 interaction can extend for 6 bp rather than 5 bp found for the interaction of the wild-type intron with its DNA target site. The selection experiments also suggest relatively strong constraints on the EBS2 sequence, likely reflecting the deleterious effect of nucleotide substitutions in the EBS2 loop on intron RNA structure. The constraints on the EBS2 sequence appear to be greater in EcI5 than in Ll.LtrB, perhaps reflecting that EBS2 is located in a junctional loop in the former and a stem–loop in the latter, a difference between IIB and IIA introns.

As for other mobile group II introns, the EcI5 IEP appears to recognize sequences in both the distal 5′-exon and 3′-exon regions of the DNA target site. In the distal 5′-exon region, the number of nucleotide residues critical for EcI5 recognition is similar to that for Ll.LtrB and other mobile group II introns, but their identity and location are different. Thus, for EcI5, the most critical distal 5′-exon nucleotide residues are C-18, C-17, A-15, and A-14, with A-14 found in 100% of active target sites (Fig. 5), while for Ll.LtrB, the most critical nucleotide residues in this region are T-23, G-21, A-20, T-19, and G-15, with none as stringently required as A-14 for EcI5 (Singh and Lambowitz 2001; Perutka et al. 2004). In the 3′ exon, EcI5 is similar to Ll.LtrB in that the only critical nucleotide residue is T+5. In Ll.LtrB, T+5 is required for second-strand cleavage by the En domain, but not for the initial DNA target site recognition or reverse splicing (Mohr et al. 2000). If that is also true for EcI5, retrohoming of EcI5 to sites lacking T+5 may occur by En-independent pathways. Although difficult to compare due to analysis by different methods, the target sequences for group II IEPs studied thus far have almost no critical nucleotide residue in common, and this divergence in IEP recognition sequences is seen even for very closely related introns, such as S. cerevisiae aI1 and aI2 (Guo et al. 1997; Yang et al. 1998; Singh and Lambowitz 2001; Jiménez-Zurdo et al. 2003; Lambowitz et al. 2005). These findings suggest that IEP recognition sequences can evolve rapidly to adapt the intron to retrohome to new sites (Singh and Lambowitz 2001).

The selection experiments enable us to calculate values for the total information content of nucleotide residues putatively recognized by the IEPs of different introns. For EcI5, this value calculated for positions −26 to −14 and +3 to +10 is 6.96 bits, while for Ll.LtrB the value calculated for the same nucleotide positions is 5.21 bits (using nucleotide frequency data of Zhong et al. 2003). As both EcI5 and Ll.LtrB are similar in potentially forming 11 or 12 bp with their DNA target sites, these findings suggest that target site recognition for EcI5 may be somewhat more stringent than for Ll.LtrB. Attesting to their very high specificity, overexpression of neither EcI5 nor Ll.LtrB is toxic to E. coli, as would be expected if these introns could indiscriminately cleave or insert into nontarget sequences. The high target specificity of mobile group II introns reflects that 11–15 nt of the DNA target sequence are recognized by base-pairing of the intron RNA and that base-pairing mismatches are expected to strongly affect the k cat for reverse splicing in addition to K m, thereby limiting off-target insertions (Xiang et al. 1998).

As a gene-targeting vector, the high retrohoming efficiency and novel target specificity of EcI5 expand the number of genomic target sites that can be targeted efficiently by group II introns. Analysis of the E. coli K12 genome for potential EcI5 targeting sites using the computer algorithm developed here revealed 14,938 such sites on either DNA strand with log-odds scores >8.2, an average of one highly ranked target site per 621 nucleotide residues. In the E. coli lacZ gene (∼3 kb), the Ll.LtrB intron has five potential targeting sites with log-odds scores >8.2, of which two have been validated (targeting efficiencies >10%; Yao and Lambowitz 2007; Zhao et al. 2008), while EcI5 has six completely different targeting sites with log-odds scores >8.2, of which five have been validated (targeting efficiencies 30%–98%). Including less efficient introns whose targeting sites have lower log-odds scores, we were able to target EcI5 to a total of 10 different sites in lacZ (Fig. 9). To date, 219 full-length ORF-containing group II introns have been identified in different bacteria, and their insertion sites suggest a wide variety of IEP recognition sequences (Simon et al. 2008). Assuming that a significant fraction of these bacterial group II introns are mobile and can be adapted for gene targeting as shown here for EcI5, the diversity of homologous IEPs with novel target specificity should greatly expand the number of target sites accessible to group II introns without resorting to protein engineering.

Finally, we note that EcI5 has a combination of characteristics that make it well suited for gene targeting applications. These characteristics include high retrohoming efficiency, novel IEP recognition specificity, relatively long EBS/IBS interactions between the intron RNA and DNA target, sufficient flexibility in its EBS sequences to retarget the EBS/IBS interactions, and a high degree of target specificity. The required combination of characteristics for gene targeting is most likely to be found among group IIA and IIB introns, whose IEPs have an En domain, enabling them to retrohome efficiently to both DNA strands without relying on nascent strands at DNA replication forks to prime reverse transcription. Group IIC introns, which recognize DNA stem–loop structures, have short EBS1/IBS1 interactions, and encode proteins lacking an En domain, are likely to have insufficient flexibility or specificity for gene targeting. The procedures described here for constructing a highly active EcI5-ΔORF derivative and rapidly determining targeting rules should be applicable to other group II introns, which when similarly developed will provide a wide range of different IEP target specificities and potentially other useful characteristics.

MATERIALS AND METHODS

Bacterial strains and growth condition

E. coli HMS174(DE3) recA1 (Novagen) was used for intron-mobility experiments, and DH5α and DH10B (Invitrogen) were used for cloning. Unless specified otherwise, cells were grown in Luria–Bertani (LB) medium, and antibiotics were added at the following concentrations: ampicillin, 100 μg/mL; chloramphenicol, 25 μg/mL; tetracycline, 25 μg/mL.

Recombinant plasmids

Previously described donor plasmids for the Ll.LtrB intron were pACD-LtrB, which expresses the full-length intron (Guo et al. 2000), and pACD2X, which expresses a 906-nt Ll.LtrB-ΔORF intron with a phage T7 promoter inserted in DIV (San Filippo and Lambowitz 2002). Both are pACYC184-based plasmids that carry a camR marker on the vector backbone and use a T7lac promoter to express the intron and short flanking exon sequences. In pACD2X, the ORF encoding the IEP (LtrA protein) is cloned downstream of the 3′ exon.

Donor plasmids for EcI5 are derivatives of pACD2X. pACDF-EcI5 contains the full-length EcI5 intron and flanking exon sequences swapped for the Ll.LtrB and LtrA segments of pACD2X. It was constructed by PCR amplifying the EcI5 segment of E. coli virulence plasmid pO157 (Burland et al. 1998), using primers EcI5-5′exon and EcI5-3′exon, which append XbaI and PstI sites, respectively (see Supplemental Table 2 for primer sequences). The PCR products were 5′ phosphorylated with phage T4 polynucleotide kinase (New England BioLabs), blunt-ended with T4 DNA polymerase (Invitrogen), and cloned between the blunt-ended HindIII and XhoI sites of pACD2X. Mutant derivatives of pACDF-EcI5 with a deletion of intron DV (intron positions 2332–2365; ΔDV) or a point mutation in the initiation codon of the intron ORF (ATG → ATT; G611T) were constructed via PCR with primers containing the mutations.

pACD3-EcI5 contains a 879-nt EcI5-ΔORF intron and flanking exons, with the IEP cloned downstream of the 3′ exon. To construct this plasmid, 5′ and 3′ segments of EcI5 were amplified separately by PCR of plasmid pO157 DNA (see above), using primers EcI5-5′exon + EcI5-P1 and EcI5-P3 + EcI5-3′exon, respectively (Supplemental Table 2). The two PCR products were then mixed and amplified with the outside primers EcI5-5′exon and EcI5-3′exon to generate a 0.961-kb PCR product containing the EcI5-ΔORF intron and flanking exons with an MluI site inserted at the site of the ORF deletion in DIV. The PCR product was blunt ended with T4 DNA polymerase (Invitrogen) and swapped for the Ll.LtrB-ΔORF intron between the XbaI and PstI sites of pACD2X (see above). The LtrA ORF encoded downstream of the intron in pACD2X was then deleted by a single-primer PCR using primer E2-3′ORF (Supplemental Table 2), which simultaneously inserts a SwaI site. Finally, the ORF encoding the EcI5 IEP was amplified from pACDF-EcI5 by PCR using the primers EcI5-5′ORF and EcI5-3′ORF (Supplemental Table 2) to generate a 1780-bp product, which was blunt ended with T4 DNA polymerase and cloned between the SwaI and SnaBI sites of the vector. Primer EcI5-5′ORF inserts the Shine–Dalgarno sequence of expression vector pET11c (New England BioLabs) upstream of the EcI5 IEP ORF.

pACD2-EcI5 was derived from pACD3-EcI5 by cloning a T7 promoter sequence (annealed 5′-phoshorylated oligonucleotides T7MLU1-T and T7MLU1-B; Supplemental Table 2) into the MluI site in DIV. Derivatives of pACD2-EcI5 with EcI5 ORF mutations YADD → YAAA (ORF nucleotide residues 838–843 changed from GATGAT to GCGGCG), H528A (ORF nucleotide residues 1582–1584 changed from CAT to GCT), and H552A (ORF nucleotide residues 1654–1656 changed from CAC to GCC) were constructed by site-directed PCR mutagenesis with primers that contain the mutant sequence, using a QuikChange Site-Directed Mutagenesis kit (Stratagene). C-terminal truncations ΔD/En (deleted amino acid residues 429–574), and ΔEn (deleted amino acid residues 518–574) were constructed by PCR with primer DelEnS, which anneals to an upstream ORF sequence containing a BglII site, and downstream primer DelD/EnA or DelEnA, respectively, which anneal at the site of the truncation and add a TGA stop codon and a SpeI site (Supplemental Table 2). The PCR products were then digested with BglII and SpeI and swapped for the corresponding fragment of pACD2-EcI5.

pACD3-EcI5A, -EcI5C, -EcI5G, or -EcI5T are different versions of pACD3-EcI5 in which EBS3 is either A, C, G, or T and IBS3 is the complementary nucleotide residue. The derivatives were constructed by site-directed PCR mutagenesis (QuikChange Site-Directed Mutagenesis kit; Stratagene), using pACD3-EcI5 as the template, with primers EBS3N and IBS3N (Supplemental Table 2), where N is the desired EBS3 or IBS3 nucleotide residue.

Donor plasmid pBL1 is a derivative of the broad-host-range vector pJB866 (Blatny et al. 1997), which uses an m-toluic acid-inducible promoter P m to express a cassette consisting of the Ll.LtrB-ΔORF intron with flanking exons followed by the IEP (Yao and Lambowitz 2007). A parallel construct expressing EcI5 targetron LacZ1806s was constructed by PCR amplifying the EcI5-ΔORF intron, flanking exon sequences, and EcI5 IEP from pACD3-EcI5A containing the retargeted intron (see below), using primers EcI5 5′ and EcI5 3′ (Supplemental Table 2), then digesting the PCR product with HindIII and XhoI and cloning it between the corresponding sites of pJB866.

Recipient plasmid pBRR3-ltrB contains the Ll.LtrB intron-insertion site (ligated exon 1 and 2 sequences of the ltrB gene from position −30 upstream to +15 downstream from the intron-insertion site) cloned upstream of a promoterless tetR gene in an AmpR pBR322-based vector (Guo et al. 2000; Karberg et al. 2001). pBRR-EcI5 and pBRR3-EcI5 contain different lengths of the EcI5 insertion site (i.e., ligated 5′- and 3′-exon sequences flanking EcI5 in virulence plasmid pO157) cloned in place of the Ll.LtrB insertion site in pBRR3-ltrB. The ligated exon sequences in pBRR-EcI5 extend from positions −30 to +5 from the intron-insertion site, and those in pBRR3-EcI5 extend from positions −30 to +15. The plasmids were constructed by annealing two 5′-phosphorylated oligonucleotides, which contain the EcI5 target sequences and append AatII and EcoRI sites, and then swapping the annealed oligonucleotides for the AatII and EcoRI fragment of pBRR3-ltrB.

All constructs were sequenced to confirm the expected modifications and lack of adventitious mutations in regions subjected to PCR.

Donor and recipient plasmid libraries

pACD2-EcI5 donor plasmid libraries used in selection experiments contain randomized nucleotide residues at EBS1 positions −7 to −1, EBS2 positions −13 to −7, or EBS3 position +1. The EBS1 and EBS2 libraries also contain randomized nucleotide residues at the corresponding IBS1 and IBS2 positions in the 5′ exon to provide complementary sequences that can base-pair with the randomized EBS1 and EBS2 sequences for RNA splicing. The libraries were constructed by a two-step PCR of pACD2-EcI5 with primers that introduce the random nucleotide residues. In the first step, two parallel PCRs were done using the following primers: EBS1 library, primers EcI5IBS1-N + EcI5BASWT and EcI5EBS1-N + EcI5AVA2AS; EBS2 library, primers EcI5IBS2-N + EcI5EBS2-N and EcI5EBS1WT + EcI5AVA2AS; and EBS3 library, primers pACD229s + EcI5 347a and EcI5EBS3N + EcI5AVA2AS (Supplemental Table 2). In the second step, the gel-purified PCR products from the first step were used as templates in a second PCR with the outside primers (underlined), which append XbaI and AvaII sites. The products were digested with XbaI and AvaII and swapped for the corresponding fragment of a pACD2-EcI5 intermediate plasmid that contained a stuffer 33-bp oligonucleotide inserted between the XbaI and AvaII sites. The use of this intermediate eliminates the possibility of library contamination by undigested or self-ligated wild-type donor plasmids.

Recipient plasmid libraries containing random nucleotide residues at different target site positions were constructed by PCR using primers that introduce random bases at the desired positions and append AatII and EcoRI sites, enabling the segment to be swapped for the corresponding segment of pBRR3-ltrB. Libraries and the primers used to construct them were as follows: −35 to −14/+2 to +20 library, EcI5-5′N22 + EcI5-3′N19; IBS1 library, EcI5 IBS1A + EcI5 IBS1S; IBS2 library, EcI5 IBS2A + EcI5 IBS2S; and IBS3 library, EcI5 IBS3A + EcI5IBS3S (Supplemental Table 2).

After construction, the libraries were electroporated into E. coli DH10B or DH5α, and the cells were grown overnight at 37°C in LB medium containing chloramphenicol (donor plasmid libraries) or ampicillin (recipient plasmid libraries). The complexity of the libraries was estimated by plating the transformed cells on LB agar containing the appropriate antibiotic and found to be 106 to 6 × 107. At least 25 colonies from each library were analyzed by PCR and sequencing and found to have unique combinations of randomized nucleotide residues.

Intron mobility and selection experiments

For mobility assays with the full-length EcI5 and Ll.LtrB introns, donor and recipient plasmids were cotransformed into E. coli HMS174(DE3), and cells were grown overnight at 37°C in LB medium containing ampicillin and chloramphenicol. The cells were then diluted 1/100 into fresh LB medium with the same antibiotics, grown at 37°C to early log phase (O.D.595 = 0.2–0.3), diluted 10-fold into fresh LB medium, and induced with IPTG under conditions specified in the figure legends for individual experiments. To detect intron integration by PCR, plasmid DNA was isolated from 5 mL of the induced culture by using a QIAprep Spin Miniprep Kit (Qiagen), and the 5′-integration junction was amplified by PCR with the primer RECSEQ01R and an intron-specific primer, either EcI5MOB01 for EcI5 or LtrBAs2.2 for Ll.LtrB (Supplemental Table 2). For quantitative mobility assays using the TetR selection system, serial dilutions of the cells were plated onto LB agar containing tetracycline plus ampicillin or ampicillin alone. Mobility efficiencies were determined as the ratio of TetR +AmpR/AmpR colonies (Guo et al. 2000).

For selection experiments, donor and recipient plasmids or libraries were electroporated into HMS174(DE3), and the transformants were allowed to recover by growth in 1-mL SOC media for 1 h at 37°C. A small portion of the culture (10 μL) was then removed, serially diluted, and plated on LB agar containing chloramphenicol and ampicillin to yield unselected clones that were sequenced via colony PCR to determine nucleotide frequency biases in the libraries. The remainder of the culture was diluted with 9 mL of LB medium containing ampicillin, chloramphenicol, and 20 mM glucose, the latter being added to decrease leaky expression of the intron from the T7lac promoter. After growing overnight at 37°C, the cells were harvested by centrifugation, washed with LB medium without glucose, and resuspended in 10 mL of fresh LB medium without glucose. In the experiment of Figure 5, a 5 mL portion of the culture was inoculated into 20 mL of LB medium containing 100 μM IPTG and induced for 1 h at 37°C, while in the experiments of Figures 68, a 7 mL portion of the culture was inoculated into 30 mL of LB medium and incubated for 1 h at 37°C. After the incubations, the cultures were placed on ice for 5 min, and the cells were then harvested by centrifugation, washed twice by centrifugation in ice-cold LB medium if necessary to remove IPTG, and resuspended in 1.5 mL ice-cold LB medium. Finally, the cells were serially diluted, plated on LB agar containing ampicillin, chloramphenicol, and tetracycline, and incubated overnight at 37°C.

Plasmids containing integrated introns from the selected colonies were either isolated using a QIAprep Spin Miniprep kit and sequenced (Fig. 5) or analyzed by colony PCR and sequencing (Figs. 68). Colony PCR was done as described (Costa and Weiner 2006) with Taq polymerase (0.05 units/μL; New England BioLabs) and 0.4 μM primers in reaction medium containing 20 mM Tris-HCl at pH 8.4, 50 mM KCl, 2.5 mM MgCl2, and 200 μM dNTPs. Reactions were incubated at 94°C for 5 min, followed by 30 cycles of 94°C for 30 sec, 55°C for 30 sec, 72°C for 1 min, and a final extension for 10 min at 72°C.

The nucleotide frequency at each of the randomized positions in the selected introns and target sites was corrected for biases in the initial pool by calculating the ratio Inline graphic of the frequency of each of the four nucleotides n at that position p in the selected introns (Inline graphic) to its frequency in the initial pool (Inline graphic) by using the equation

graphic file with name 432equ2.jpg

The ratio at each position was then normalized to 1 by using the equation

graphic file with name 432equ3.jpg

where Inline graphicis the normalized ratio. For a given nucleotide residue, a normalized ratio of 0.25 (or 25%) indicates no preference at that position, while larger values indicate selection for the nucleotide residue, and smaller values indicate selection against the nucleotide residue. The normalized frequencies were used to generate a sample set of 100 active target sites and displayed in “WebLogo” format (Crooks et al. 2004).

Retargeting EcI5 to insert into different sites

Retargeted EcI5-ΔORF introns were constructed and expressed from donor plasmids pACD3-EcI5A, -EcI5C, -EcI5G, or -EcI5T, with the choice of donor plasmid dictated by the EBS3 nucleotide residue complementary to the IBS3 nucleotide residue in the DNA target site. The intron was retargeted by a two-step PCR, using the wild-type donor plasmid as template with three unique primers (P1, P2, and P3) and one fixed primer (P4). P1 corresponds to 5′-exon positions −30 to −14, with modifications at positions −13 to −1 to make IBS1 and IBS2 in the donor plasmid complementary to the retargeted EBS1 and EBS2 for efficient RNA splicing; P2 is complementary to intron positions +238 to +296 with modifications at position +253 to +257 to make EBS2 complementary to IBS2 in the DNA target site; P3 corresponds to intron positions +277 to +336, with modifications at positions +316 to +320 (5 bp) or +321 (6 bp) to make the EBS1 complementary to IBS1 in the DNA target site; P4 is a fixed primer complementary to intron positions +473 to +505. First, overlapping segments of the donor plasmid were amplified by two parallel PCRs, one using primers P1 and P2 and the other using primers P3 and P4. Then, the gel-purified PCR products from the first PCR step were mixed and amplified with the outside primers P1 and P4 to generate a 549-bp PCR product containing sequences corresponding to the 5′ exon and 5′ end of the intron (nucleotide positions E1 −30 to intron +505). The final PCR product was purified in a 0.8% agarose gel, digested with XbaI and AvaII, and swapped for the corresponding fragment of the donor plasmid.

For lacZ gene targeting, the donor plasmid containing the retargeted intron was electroporated into E. coli HMS174(DE3). After recovery by growth in 1 mL of SOC media for 1 h at 37°C, the cells were diluted with 4 mL of LB medium containing chloramphenicol (25 μg/mL) for pACD3-EcI5 or tetracycline (15 μg/mL) for pBL1-EcI5 and grown overnight at 37°C. A 50 μL portion of the overnight culture was then inoculated into 5 mL of fresh LB medium containing the same antibiotic and grown to early log phase (O.D.600 = 0.2–0.3). For induction, 200 μL of the early log-phase culture were inoculated into 5 mL of LB medium containing 100 μM IPTG (for pACD3-based donor plasmids) or 4 mM m-toluic acid (for pBL1-based donor plasmids) and incubated at 37°C for 3 h. The cells were then washed with fresh LB medium and plated on LB agar containing X-gal (40 mg/L). The lacZ targeting frequency was determined by counting blue and white colonies.

Southern hybridization

Southern hybridizations were as described (Perutka et al. 2004), using DNA isolated from colonies grown up in LB medium with a Qiagen Genomic DNA Isolation Kit (Qiagen). The blots were hybridized with a 32P-labeled probe corresponding to intron positions 362–739. The probe was generated by PCR of pACD2-EcI5 using primers Ec501s and Ec5T7-AS (Supplemental Table 2), followed by labeling with a High Prime DNA labeling kit (Roche).

SUPPLEMENTAL MATERIAL

Supplemental material can be found at http://www.rnajournal.org.

ACKNOWLEDGMENTS

This work was supported by NIH grant GM037949 and Welch Foundation Grant F-1607. We thank Steve Zimmerly (University of Calgary) for DNA from E. coli strain O157:H7 containing virulence plasmid pO157 and comments on group II intron subclasses, and Qiong Fu for performing the experiment of Supplemental Figure 1B.

Footnotes

Article published online ahead of print. Article and publication date are at http://www.rnajournal.org/cgi/doi/10.1261/rna.1378909.

REFERENCES

  1. Blatny J.M., Brautaset T., Winther-Larsen C., Karunakaran P., Valla S. Improved broad-host range RK2 vectors useful for high and low regulated gene expression levels in gram-negative bacteria. Plasmid. 1997;38:35–51. doi: 10.1006/plas.1997.1294. [DOI] [PubMed] [Google Scholar]
  2. Blocker F.J.H., Mohr G., Conlan L.H., Qi L., Belfort M., Lambowitz A.M. Domain structure and three-dimensional model of a group II intron-encoded reverse transcriptase. RNA. 2005;11:14–28. doi: 10.1261/rna.7181105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Burland V., Shao Y., Perna N.T., Plunkett G., Sofia H.J., Blattner F.R. The complete DNA sequence and analysis of the large virulence plasmid of Escherichia coli O157:H7. Nucleic Acids Res. 1998;26:4196–4204. doi: 10.1093/nar/26.18.4196. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Chen Y., McClane B.A., Fisher D.J., Rood J.I., Gupta P. Construction of an α toxin gene knockout mutant of Clostridium perfringens type A by use of a mobile group II intron. Appl. Environ. Microbiol. 2005;71:7542–7547. doi: 10.1128/AEM.71.11.7542-7547.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Chen Y., Caruso L., McClane B., Fisher D., Gupta P. Disruption of a toxin gene by introduction of a foreign gene into the chromosome of Clostridium perfringens using targetron-induced mutagenesis. Plasmid. 2007;58:182–189. doi: 10.1016/j.plasmid.2007.04.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Coros C.J., Landthaler M., Piazza C.L., Beauregard A., Esposito D., Perutka J., Lambowitz A.M., Belfort M. Retrotransposition strategies of the Lactococcus lactis Ll.LtrB group II intron are dictated by host identity and cellular environment. Mol. Microbiol. 2005;56:509–524. doi: 10.1111/j.1365-2958.2005.04554.x. [DOI] [PubMed] [Google Scholar]
  7. Coros C.J., Piazza C.L., Chalamcharla V.R., Belfort M. A mutant screen reveals RNase E as a silencer of group II intron retromobility in Escherichia coli . RNA. 2008;14:2634–2644. doi: 10.1261/rna.1247608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Costa G.L., Weiner M.P. 2006. Colony PCR. In Cold Spring Harb. Protoc.; 2006. [DOI] [PubMed] [Google Scholar]
  9. Costa M., Michel F., Westhof E. A three-dimensional perspective on exon binding by a group II self-splicing intron. EMBO J. 2000;19:5007–5018. doi: 10.1093/emboj/19.18.5007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Cousineau B., Smith D., Lawrence-Cavanagh S., Mueller J.E., Yang J., Mills D., Manias D., Dunny G., Lambowitz A.M., Belfort M. Retrohoming of a bacterial group II intron: Mobility via complete reverse splicing, independent of homologous DNA recombination. Cell. 1998;94:451–462. doi: 10.1016/s0092-8674(00)81586-x. [DOI] [PubMed] [Google Scholar]
  11. Crooks G.E., Hon G., Chandonia J.M., Brenner S.E. WebLogo: A sequence logo generator. Genome Res. 2004;14:1188–1190. doi: 10.1101/gr.849004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Dai L., Zimmerly S. Compilation and analysis of group II intron insertions in bacterial genomes: Evidence for retroelement behavior. Nucleic Acids Res. 2002a;30:1091–1102. doi: 10.1093/nar/30.5.1091. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Dai L., Zimmerly S. The dispersal of five group II introns among natural populations of Escherichia coli . RNA. 2002b;8:1294–1307. doi: 10.1017/s1355838202023014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Eckert B., Beck C.F. Overproduction of transposon Tn10-encoded tetracycline resistance protein results in cell death and loss of membrane potential. J. Bacteriol. 1989;171:3557–3559. doi: 10.1128/jb.171.6.3557-3559.1989. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Eskes R., Yang J., Lambowitz A.M., Perlman P.S. Mobility of yeast mitochondrial group II introns: Engineering a new site specificity and retrohoming via full reverse splicing. Cell. 1997;88:865–874. doi: 10.1016/s0092-8674(00)81932-7. [DOI] [PubMed] [Google Scholar]
  16. Eskes R., Liu L., Ma H., Chao M.Y., Dickson L., Lambowitz A.M., Perlman P.S. Multiple homing pathways used by yeast mitochondrial group II introns. Mol. Cell. Biol. 2000;20:8432–8446. doi: 10.1128/mcb.20.22.8432-8446.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Frazier C.L., San Filippo J., Lambowitz A.M., Mills D.A. Genetic manipulation of Lactococcus lactis by using targeted group II introns: Generation of stable insertions without selection. Appl. Environ. Microbiol. 2003;69:1121–1128. doi: 10.1128/AEM.69.2.1121-1128.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Granlund M., Michel F., Norgren M. Mutually exclusive distribution of IS1548 and GBSi1, an active group II intron identified in human isolates of group B streptococci. J. Bacteriol. 2001;183:2560–2569. doi: 10.1128/JB.183.8.2560-2569.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Guo H. University of Texas at Austin; Austin, TX: 2000. “The target-site recognition mechanism of group II intron endonucleases and its use in gene targeting.”. Ph.D. thesis, [Google Scholar]
  20. Guo H., Zimmerly S., Perlman P.S., Lambowitz A.M. Group II intron endonucleases use both RNA and protein subunits for recognition of specific sequences in double-stranded DNA. EMBO J. 1997;16:6835–6848. doi: 10.1093/emboj/16.22.6835. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Guo H., Karberg M., Long M., Jones J.P., 3rd, Sullenger B., Lambowitz A.M. Group II introns designed to insert into therapeutically relevant DNA target sites in human cells. Science. 2000;289:452–457. doi: 10.1126/science.289.5478.452. [DOI] [PubMed] [Google Scholar]
  22. Heap J.T., Pennington O.J., Cartman S.T., Carter G.P., Minton N.P. The ClosTron: A universal gene knock-out system for the genus Clostridium . J. Microbiol. Methods. 2007;70:452–464. doi: 10.1016/j.mimet.2007.05.021. [DOI] [PubMed] [Google Scholar]
  23. Huang H.R., Chao M.Y., Armstrong B., Wang Y., Lambowitz A.M., Perlman P.S. The DIVa maturase binding site in the yeast group II intron aI2 is essential for intron homing but not for in vivo splicing. Mol. Cell. Biol. 2003;23:8809–8819. doi: 10.1128/MCB.23.23.8809-8819.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Jarrell K.A., Dietrich R.C., Perlman P.S. Group II intron domain 5 facilitates a trans-splicing reaction. Mol. Cell. Biol. 1988;8:2361–2366. doi: 10.1128/mcb.8.6.2361. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Jiménez-Zurdo J.I., Garcia-Rodriguez F.M., Barrientos-Duran A., Toro N. DNA target site requirements for homing in vivo of a bacterial group II intron encoding a protein lacking the DNA endonuclease domain. J. Mol. Biol. 2003;326:413–423. doi: 10.1016/s0022-2836(02)01380-3. [DOI] [PubMed] [Google Scholar]
  26. Karberg M., Guo H., Zhong J., Coon R., Perutka J., Lambowitz A.M. Group II introns as controllable gene targeting vectors for genetic manipulation of bacteria. Nat. Biotechnol. 2001;19:1162–1167. doi: 10.1038/nbt1201-1162. [DOI] [PubMed] [Google Scholar]
  27. Lambowitz A.M., Zimmerly S. Mobile group II introns. Annu. Rev. Genet. 2004;38:1–35. doi: 10.1146/annurev.genet.38.072902.091600. [DOI] [PubMed] [Google Scholar]
  28. Lambowitz A.M., Mohr G., Zimmerly S. Group II intron homing endonucleases: Ribonucleoprotein complexes with programmable target specificity. In: Belfort M., et al., editors. Homing endonucleases and inteins. Springer; Berlin: 2005. pp. 121–145. [Google Scholar]
  29. Malhotra M., Srivastava S. An ipdC gene knock-out of Azospirillum brasilense strain SM and its implications on indole-3-acetic acid biosynthesis and plant growth promotion. Antonie Van Leeuwenhoek. 2008;93:425–433. doi: 10.1007/s10482-007-9207-x. [DOI] [PubMed] [Google Scholar]
  30. Martínez-Abarca F., Barrientos-Duran A., Fernandez-Lopez M., Toro N. The RmInt1 group II intron has two different retrohoming pathways for mobility using predominantly the nascent lagging strand at DNA replication forks for priming. Nucleic Acids Res. 2004;32:2880–2888. doi: 10.1093/nar/gkh616. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Mastroianni M., Watanabe K., White T.B., Zhuang F., Vernon J., Matsuura M., Wallingford J., Lambowitz A.M. Group II intron-based gene targeting reactions in eukaryotes. PLoS One. 2008;3:e3121. doi: 10.1371/journal.pone.0003121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Michel F., Ferat J.L. Structure and activities of group II introns. Annu. Rev. Biochem. 1995;64:435–461. doi: 10.1146/annurev.bi.64.070195.002251. [DOI] [PubMed] [Google Scholar]
  33. Michel F., Umesono K., Ozeki H. Comparative and functional anatomy of group II catalytic introns—A review. Gene. 1989;82:5–30. doi: 10.1016/0378-1119(89)90026-7. [DOI] [PubMed] [Google Scholar]
  34. Mohr G., Smith D., Belfort M., Lambowitz A.M. Rules for DNA target-site recognition by a lactococcal group II intron enable retargeting of the intron to specific DNA sequences. Genes & Dev. 2000;14:559–573. [PMC free article] [PubMed] [Google Scholar]
  35. Perutka J., Wang W., Goerlitz D., Lambowitz A.M. Use of computer-designed group II introns to disrupt Escherichia coli DExH/D-box protein and DNA helicase genes. J. Mol. Biol. 2004;336:421–439. doi: 10.1016/j.jmb.2003.12.009. [DOI] [PubMed] [Google Scholar]
  36. Pyle A.M., Lambowitz A.M. Group II introns: Ribozymes that splice RNA and invade DNA. In: Gesteland R.F., et al., editors. The RNA world. 3rd ed. Cold Spring Harbor Laboratory Press; Cold Spring Harbor, NY: 2006. pp. 469–505. [Google Scholar]
  37. Quiroga C., Roy P.H., Centron D. The S.ma.I2 class C group II intron inserts at integron attC sites. Microbiology. 2008;154:1341–1353. doi: 10.1099/mic.0.2007/016360-0. [DOI] [PubMed] [Google Scholar]
  38. Robart A.R., Seo W., Zimmerly S. Insertion of group II intron retroelements after intrinsic transcriptional terminators. Proc. Natl. Acad. Sci. 2007;104:6620–6625. doi: 10.1073/pnas.0700561104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Rodriguez S.A., Yu J.J., Davis G., Arulanandam B.P., Klose K.E. Targeted inactivation of Francisella tularensis genes by group II introns. Appl. Environ. Microbiol. 2008;74:2619–2626. doi: 10.1128/AEM.02905-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. San Filippo J., Lambowitz A.M. Characterization of the C-terminal DNA-binding/DNA endonuclease region of a group II intron-encoded protein. J. Mol. Biol. 2002;324:933–951. doi: 10.1016/s0022-2836(02)01147-6. [DOI] [PubMed] [Google Scholar]
  41. Sayeed S., Uzal F.A., Fisher D.J., Saputo J., Vidal J.E., Chen Y., Gupta P., Rood J.I., McClane B.A. β Toxin is essential for the intestinal virulence of Clostridium perfringens type C disease isolate CN3685 in a rabbit ileal loop model. Mol. Microbiol. 2008;67:15–30. doi: 10.1111/j.1365-2958.2007.06007.x. [DOI] [PubMed] [Google Scholar]
  42. Shao L., Hu S., Yang Y., Gu Y., Chen J., Jiang W., Yang S. Targeted gene disruption by use of a group II intron (targetron) vector in Clostridium acetobutylicum . Cell Res. 2007;17:963–965. doi: 10.1038/cr.2007.91. [DOI] [PubMed] [Google Scholar]
  43. Simon D.M., Clarke N.A.C., McNeil B.A., Johnson I., Pantuso D., Dai L., Chai D., Zimmerly S. Group II introns in eubacteria and archaea: ORF-less introns and new varieties. RNA. 2008;14:1704–1713. doi: 10.1261/rna.1056108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Singh N.N., Lambowitz A.M. Interaction of a group II intron ribonucleoprotein endonuclease with its DNA target site investigated by DNA footprinting and modification interference. J. Mol. Biol. 2001;309:361–386. doi: 10.1006/jmbi.2001.4658. [DOI] [PubMed] [Google Scholar]
  45. Smith D., Zhong J., Matsuura M., Lambowitz A.M., Belfort M. Recruitment of host functions suggests a repair pathway for late steps in group II intron retrohoming. Genes & Dev. 2005;19:2477–2487. doi: 10.1101/gad.1345105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Sugimoto N., Nakano M., Nakano S. Thermodynamics-structure relationship of single mismatches in RNA/DNA duplexes. Biochemistry. 2000;39:11270–11281. doi: 10.1021/bi000819p. [DOI] [PubMed] [Google Scholar]
  47. Toor N., Hausner G., Zimmerly S. Coevolution of group II intron RNA structures with their intron-encoded reverse transcriptases. RNA. 2001;7:1142–1152. doi: 10.1017/s1355838201010251. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Toro N., Molina-Sánchez M.D., Fernández-López M. Identification and characterization of bacterial class E group II introns. Gene. 2002;299:245–250. doi: 10.1016/s0378-1119(02)01079-x. [DOI] [PubMed] [Google Scholar]
  49. Toro N., Jiménez-Zurdo J.I., García-Rodríguez F.M. Bacterial group II introns: Not just splicing. FEMS Microbiol. Rev. 2007;31:342–358. doi: 10.1111/j.1574-6976.2007.00068.x. [DOI] [PubMed] [Google Scholar]
  50. Wank H., San Filippo J., Singh R.N., Matsuura M., Lambowitz A.M. A reverse transcriptase/maturase promotes splicing by binding at its own coding segment in a group II intron RNA. Mol. Cell. 1999;4:239–250. doi: 10.1016/s1097-2765(00)80371-8. [DOI] [PubMed] [Google Scholar]
  51. Xiang Q., Qin P.Z., Michels W.J., Freeland K., Pyle A.M. Sequence specificity of a group II intron ribozyme: Multiple mechanisms for promoting unusually high discrimination against mismatched targets. Biochemistry. 1998;37:3839–3849. doi: 10.1021/bi972661n. [DOI] [PubMed] [Google Scholar]
  52. Yang J., Zimmerly S., Perlman P.S., Lambowitz A.M. Efficient integration of an intron RNA into double-stranded DNA by reverse splicing. Nature. 1996;381:332–335. doi: 10.1038/381332a0. [DOI] [PubMed] [Google Scholar]
  53. Yang J., Mohr G., Perlman P.S., Lambowitz A.M. Group II intron mobility in yeast mitochondria: Target DNA-primed reverse transcription activity of aI1 and reverse splicing into DNA transposition sites in vitro . J. Mol. Biol. 1998;282:505–523. doi: 10.1006/jmbi.1998.2029. [DOI] [PubMed] [Google Scholar]
  54. Yao J., Lambowitz A.M. Gene targeting in gram-negative bacteria by use of a mobile group II intron (“targetron”) expressed from a broad-host-range vector. Appl. Environ. Microbiol. 2007;73:2735–2743. doi: 10.1128/AEM.02829-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Yao J., Zhong J., Fang Y., Geisinger E., Novick R.P., Lambowitz A.M. Use of targetrons to disrupt essential and nonessential genes in Staphylococcus aureus reveals temperature sensitivity of Ll.LtrB group II intron splicing. RNA. 2006;12:1271–1281. doi: 10.1261/rna.68706. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Zhao J., Lambowitz A.M. A bacterial group II intron-encoded reverse transcriptase localizes to cellular poles. Proc. Natl. Acad. Sci. 2005;102:16133–16140. doi: 10.1073/pnas.0507057102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Zhao J., Niu W., Yao J., Mohr S., Marcotte E.M., Lambowitz A.M. Group II intron protein localization and insertion sites are affected by polyphosphate. PLoS Biol. 2008;6:1306–1320. doi: 10.1371/journal.pbio.0060150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Zhong J., Lambowitz A.M. Group II intron mobility using nascent strands at DNA replication forks to prime reverse transcription. EMBO J. 2003;22:4555–4565. doi: 10.1093/emboj/cdg433. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Zhong J., Karberg M., Lambowitz A.M. Targeted and random bacterial gene disruption using a group II intron (targetron) vector containing a retrotransposition-activated selectable marker. Nucleic Acids Res. 2003;31:1656–1664. doi: 10.1093/nar/gkg248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Zimmerly S., Guo H., Eskes R., Yang J., Perlman P.S., Lambowitz A.M. A group II intron RNA is a catalytic component of a DNA endonuclease involved in intron mobility. Cell. 1995a;83:529–538. doi: 10.1016/0092-8674(95)90092-6. [DOI] [PubMed] [Google Scholar]
  61. Zimmerly S., Guo H., Perlman P.S., Lambowitz A.M. Group II intron mobility occurs by target DNA-primed reverse transcription. Cell. 1995b;82:545–554. doi: 10.1016/0092-8674(95)90027-6. [DOI] [PubMed] [Google Scholar]
  62. Zimmerly S., Hausner G., Wu X. Phylogenetic relationships among group II intron ORFs. Nucleic Acids Res. 2001;29:1238–1250. doi: 10.1093/nar/29.5.1238. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from RNA are provided here courtesy of The RNA Society

RESOURCES