Abstract
We applied a comprehensive data mining strategy to examine the repertoires of rat and mouse odorant receptors (ORs) and type 1 pheromone receptors (V1Rs) using the mm5 and rn3 genome respectively. 1576 rat OR genes were identified, including 292 pseudogenes. The rat V1R repertoire is composed of 115 intact genes and 72 pseudogenes. The mouse OR and V1R database were updated using the new assembly mm5, from which 1375 mouse ORs and 308 V1Rs were identified, with more than a hundred putative pseudogenes from mm2 now identified as intact because of the higher sequence quality. With this new data we have conducted a series of genomic analyses of the OR and V1R genes from mouse and rat. Orthologous OR clusters were identified in mouse and rat and comparison analysis was performed at three incremental levels: families, coding sequences, and motifs. At the family level, we found that V1R genes have more species-specific families than OR genes. About 20 percent of intact V1R genes have no orthologous counterpart in the same family, whereas less than 1 percent of intact ORs are similarly isolated. At the coding sequence level, OR genes are more conserved between mouse and rat than V1R genes. OR genes share greater similarity with their orthologous counterparts than with their closest neighbor, whereas V1R genes show the opposite tendency. Motifs were identified to obtain biological insights. Motifs specific for species or families were found in OR and V1R genes, which may result in the differential pheromone-dependent behaviors and perception of odors between mouse and rat.
Keywords: Olfactory receptor, Odorant receptor, Pheromone receptor, OR, V1R, Motif, Comparative Genomics
Introduction
A wide variety of chemicals in the environment are critical to an animal’s survival and olfactory systems have developed large and diverse receptor gene families in response to this demand. Many mammals, as well as other non-mammalian vertebrates, have two anatomically independent chemosensory systems: the main olfactory system and the vomeronasal system. To a first approximation the main olfactory system is devoted to discriminating environmental odors while the vomeronasal system detects pheromones and other molecules important in mediating social interactions. Both olfactory receptors and vomeronasal receptors belong to the superfamily of seven transmembrane-domain, G protein-coupled receptors (GPCRs). Recently, additional roles for olfactory receptors have been proposed, most notably in axon guidance [1].
Since their initial discovery in rat [2], OR genes have been identified in various species of both invertebrates and vertebrates [3]. The genes encoding ORs constitute a large gene superfamily, especially in mammals. The mouse genome has ~1200 ORs, there are ~1000 in dog (Canis familiaris), and ~900 in human. In mouse, ~20% of ORs are pseudogenes, whereas this fraction is much higher (~60-70%) in human [4; 5; 6; 7]. These data have been updated and confirmed with improved and new data from genome sequencing projects, and through hybridization and microarray experiments [8; 9].
The vomeronasal receptors are classified into two major families, V1R and V2R. V1R receptors are members of the Class A GPCRs while V2Rs belong to Class C GPCRs. Both are expressed primarily in the Vomeronasal Organ (VNO), a tissue distinct from the main olfactory epithelium (MOE). The mouse V1R repertoire was defined using a data mining strategy similar to that employed for ORs. It consisted of 164 potentially intact genes divided into 12 families [10; 11]. Primates appear to have lost most V1R genes [12]. In contrast to V1Rs, V2Rs possess a long extracellular N-terminus composed of 5 additional exons, making it more difficult to extract their coding sequences from the genomic database. Deletions of VR genes have resulted in alterations in social behaviors consistent with their primary function as pheromone detectors [13] [14].
The availability of genome sequences for two rodents, Mus musculus and Rattus norvegicus [15; 16 ], which diverged ~12-24 million years ago [17], provides an opportunity to examine the diversification of these large families of odorant and pheromone receptors during this relatively short time period. We have conducted a comprehensive data mining effort to extract the repertoire of ORs and V1Rs in rodents using the updated genomes, followed by a comparative genomic analysis to investigate species specificity for both rat and mouse. We find that the rat has a larger OR repertoire, but smaller V1R repertoire, than mouse. V1R genes are more species-specific than OR genes, which seems reasonable given their pheromone-related functions. OR genes tend to be more similar to their orthologous counterparts from other species than to their paralogous neighbors from the same species, whereas V1R genes show the opposite tendency. We have identified conserved motifs with consideration of their possible biological functions.
Results and Discussion
Data mining for OR/V1R repertoire
We have used a comprehensive data mining system to search for candidate OR/V1R gene sequences in the updated genomes from UCSC (http://genome.ucsc.edu/). Using a similar method as Zhang [4], we conducted exhaustive TBLASTN searches to ensure high sensitivity for OR-like/V1R-like sequences using known mammalian ORs/V1Rs as queries. To update the mouse OR/V1R repertoire, a high-speed BLAT tool was used to replace TBLASTN to perform searches at one time. The output sequences were subject to a series of further analyses incorporating conceptual translation, profile HMM searches and BLASTP searches to determine which were reliable OR/V1R sequences. FASTY3, along with a database of ~1000 previously identified mammalian full-length ORs, was used to perform conceptual translation to identify the coding region of all candidate ORs. The identified mammalian full-length ORs were also used to build an HMM model for profile searches to determine the probability that these are true ORs. For V1Rs, ~170 previously identified rodent full-length V1R genes comprised the database for the FASTY3 and HMM models. Except for the initial TBLASTN search, which was done using the Ensembl server (http://www.ensembl.org/), all other analysis steps were automated by investigator-developed programs (for details, see the Methods).
From the comprehensive data mining effort, we identified the nearly complete repertoire of rat ORs, consisting of 1,576 genes. We also updated the repertoire of mouse ORs, which now contains 1375 genes (shown in Table 1). Together, these constitute by far the largest gene families in the mammalian genome. 1284 rat ORs are potentially functional genes and 292 (18%) appear to be pseudogenes, while 1194 mouse ORs are putatively intact genes and only 181 (13%) appear to be pseudogenes. Some of this difference, however, is likely attributable to the difference in sequence quality of the rat genome (rn3, Jun 2003) and mouse genome (mm5, May 2004). Here we have used the same criterion as Zhang[4] to define pseudogenes: they contain no less than two frameshifts or stop codons within the coding region. As updated versions of the mouse genome have become available a number of pseudogenes have been re-classified as intact. Therefore, we believe that the rat is likely to have more than the 1284 intact genes currently identified and the size of the repertoire may increase slightly.
Table 1.
Number of OR/V1R genes, clusters and families
Rat (rn3) | Mouse (mm5) | |||
---|---|---|---|---|
Gene type | Ors | V1Rs | ORs | V1Rs |
Intact genes | 1284 | 115 | 1194 | 191 |
Pseudogenes | 292 | 72 | 181 | 117 |
Total | 1576(153)* | 187 | 1375(158)* | 308 |
Percentage of psedogenes | 18.5% | 38.5% | 13.1% | 38.0% |
| ||||
Number of clusters** | 41 | 9 | 43 | 10 |
Number of isolated single gene | 23 | 5 | 21 | 8 |
Percentage of isolated genes | 1.5% | 2.7% | 1.5% | 2.6% |
Phylogenetic families | 160 | 12 | 149 | 13 |
Number of Class I ORs are shown in the brackets.
Definition of one cluster: the loci are not farther than 1Mb, and at least 2 genes are included.
In addition to the OR repertoire, we also examined V1R genes (shown in Table 1). Using the same strategy as for OR genes, the rat V1R repertoire was identified and the mouse V1R repertoire was updated. In rat we identified 187 genes in total, of which 115 are potentially intact (pseudogenes constitute ~39%). The updated mouse V1R repertoire consisted of 308 genes with 191 putatively intact genes (pseudogenes occupy ~38%). Here as well the quality of the genome sequences may have an effect on the fraction of pseudogenes. It should also be noted that the size of V1R repertoire, especially in rat, could be underestimated if some highly specific V1R genes exist.
All of the sequences in our database were <98% identical with each other, except in a few cases where two very similar genes were unambiguously located at different genomic locations.
Genomic Distribution of ORs and V1Rs
Rat OR and V1Rs
In the rn3 (rat) assembly, 1525 OR genes and 181 V1R genes are mapped to specific genomic locations. 96.8% of the OR genes are mapped, as were a similar percentage of V1R genes. The number of OR/V1R genes on each chromosome is shown in Fig 1a. Chromosomes 1, 3, 7, 8 and 10 have the largest number of OR genes, whereas no OR genes are located on chromosomes 18 and Y. For V1R genes, the mapped genes are found on chromosome 1, 4, 7, 14 and 17. Only two isolated genes are located on chromosomes 14 and 17. Both OR and V1R genes tend to form tight clusters, but they are not intermingled with one another. In total, OR genes are distributed in 41 clusters, and V1R genes in 9 clusters. The distribution of OR and V1R genes is shown in Fig 1b, using chromosomes 1 and 4 as examples. Isolated single genes occur only rarely (1.5% for ORs, 2.7% for V1Rs) (shown in Table 1). Among these 23 isolated OR genes, 14 are pseudogenes, which is much higher than the average fraction of pseudogenes in the rat OR repertoire. For V1R genes, only 2 of the 5 single genes are intact.
Fig. 1. Chromosomal distribution of rat and mouse OR/V1R genes.
Blue, intact OR genes; red, OR pseudogenes; green, intact V1R genes; purple, V1R pseudogenes. The number of OR/V1R genes per 1Mb is shown as bars on each chromosome. The height of each bar is proportional to the number of genes in that locus. (a) The number of OR/V1R genes on each chromosome of rat. “Un” represents the sequences unmapped in current rn3 assembly. There is no OR/V1R gene on either chromosome 18 or Y. (b) Rat chromosomes 1 and 4 are drawn according to the rn3 assembly. The cytogenetic map of each chromosome is shown under the scaffold assembly in scale. The number of OR/V1R genes per 1Mb is shown as bars on each chromosome. The black arrow points to one OR cluster that is very close to V1R genes, but not intermingled with them. Rat Class I OR cluster is indicated with a grey arrowhead. (c) The number of OR/V1R genes on each chromosome of mouse. “Un” represents the sequences from unmapped scaffolds of mm5 assembly. There are no OR genes on chromosome 18. (d) Mouse chromosome 6 and 7 are drawn according to the mm5 assembly. The cytogenetic map of each chromosome is shown under the scaffold assembly in scale. The number of OR/V1R genes per 1Mb is shown as bars on each chromosome. Mouse Class I OR cluster is indicated with a grey arrowhead.
As shown in Fig 1a, chromosomes 1 and 3 harbor the largest number of OR genes, where there are 344 ORs each. Characteristics, such as the number and density of genes and pseudogene frequency, vary among the clusters (see Table 2). The largest cluster, which is located on chromosome 3 and harbors about 1/6 of the entire rat OR repertoire, has the lowest fraction of pseudogenes, while in the third largest cluster on chromosome 7, containing 173 ORs, nearly 50% are pseudogenes. Thus there is no obvious correlation (Pearson r=0.25) between the fraction of pseudogenes and the density of clusters. For rat V1Rs, only clusters with more than 10 genes are shown in Table 2. In general, rat V1R clusters have a higher percentage of pseudogenes but a lower average density compared to rat OR clusters.
Table 2.
Characteristics of OR/V1R Clusters
Rat OR | Mouse OR | ||||||
---|---|---|---|---|---|---|---|
Location | Number of genes | Percentage of pseudogenes | Average Distance(kb/per gene) | Location | Number of genes | Percentage of pseudogenes | Average distance (kb/per gene) |
Chr3:68.5-74.6Mb | 262 | 6.9% | 23.5 | Chr2:85.2-90.2Mb | 262 | 8.8% | 19.1 |
Chr1:160.1-166.7Mb | 227 | 18.5% | 28.7 | Chr7:90.0-92.9Mb | 152 | 9.9% | 19.1 |
Chr7:93.9-96.2Mb | 73 | 15.1% | 30.4 | ||||
Chr7:2.3-8.5Mb | 173 | 45.7% | 35.5 | Chr10:129.1-130.3Mb | 62 | 8.1% | 19.3 |
Chr8:39.0-43.1Mb | 132 | 12.1% | 31.6 | Chr9:37.8-40.2Mb | 114 | 10.5% | 21.8 |
Chr20:0.09-1.5Mb | 73 | 9.6% | 19.8 | Chr17:35.6-36.9Mb | 54 | 20.4% | 23.7 |
Chr8:16.1-19.3Mb | 66 | 19.7% | 48.9 | Chr9:18.6-20.2Mb | 44 | 18.2% | 36.8 |
Chr1:214.8-217.2Mb | 58 | 8.6% | 42.1 | Chr19:11.1-13.2Mb | 81 | 11.1% | 26.1 |
Chr10:60.2-61.7Mb | 46 | 15.2% | 31.0 | Chr11:73.0-74.0Mb | 44 | 18.2% | 23.0 |
Chr10:44.1-45.9Mb | 45 | 11.1% | 40.6 | Chr11:58.1-59.2Mb | 26 | 11.5% | 44.5 |
Chr3:96.7-97.9Mb | 44 | 9.1% | 27.7 | Chr2:111.1-112.1Mb | 48 | 6.3% | 19.5 |
| |||||||
Rat V1r | Mouse V1r | ||||||
| |||||||
Chr1:70.1-73.8Mb | 40 | 37.5% | 91.9 | Chr13:21.4-22.6Mb | 68 | 44.1% | 17.8 |
Chr1:57.0-59.8Mb | 37 | 40.5% | 75.5 | Chr6:56.9-58.5Mb | 44 | 31.8% | 36.1 |
Chr4:86.2-87.3Mb | 32 | 21.9% | 36.0 | Chr7:11.0-12.8Mb | 40 | 27.5% | 43.8 |
Chr4:124.1-125.9Mb | 26 | 34.6% | 69.1 | Chr7:5.0-7.3Mb | 34 | 41.2% | 65.9 |
Chr1:62.6-63.3Mb | 16 | 43.8% | 45.6 | Chr6:90.0-90.6Mb | 25 | 40.0% | 22.8 |
Chr7:15.0-15.9Mb | 12 | 50.0% | 73.2 | Chr17:19.1-20.0Mb | 18 | 22.2% | 50.2 |
Mouse OR and V1Rs
In the mm5 assembly, 1303 ORs and 253 V1Rs are mapped to specific genomic locations, whereas 72 ORs and 53 V1Rs could not be mapped. In earlier reports based on the mm2 assembly (Zhang et al 2004), about one-third of V1R genes could not be mapped, but our updating using the mm5 assembly improves this to only 17% unmapped genes. The number of OR/V1R genes on each chromosome is shown in Fig 1c. Chromosomes 2, 7, 9, 10, 11 have the largest number of OR genes, whereas no OR genes are found on chromosomes 18 and Y. For V1R genes, the mapped genes are found on chromosomes 4, 6, 7, 13, 17, 18. Five isolated V1R genes are found on the following chromosomes: 1, 2, 5, 14, X. Similar to the rat OR and V1R genes, mouse ORs and V1Rs also tend to form tight clusters. Mouse OR genes are distributed in 43 clusters, and V1R genes in 10 clusters. The distribution of mouse OR and V1R genes is shown in Fig 1d, taking chromosomes 6 and 7 as examples. Isolated genes also occur rarely (1.5% for ORs, 2.6% for V1Rs), but have a higher fraction of pseudogenes. 11 of the 21 single ORs and 5 of the 8 isolated V1Rs do not appear to be functional.
As shown in Fig 1c, the largest OR clusters are localized on chromosomes 2 and 7, which harbor 344 and 267 ORs respectively. The characteristics of mouse clusters which are orthologous with rat OR clusters on the left are shown in Table 2. Note that Rat cluster 1_160 splits into two clusters (7_90 and 7_94) in mouse. The second large mouse cluster, which consists only of Class I ORs, is one of the densest clusters. For mouse V1Rs, clusters having more than 10 genes are listed. Unexpectedly, the largest mouse V1R cluster, 13_21, has the highest fraction of pseudogenes and shortest average inter-gene distance. Why some clusters have a much higher proportion of pseudogenes than others is unknown.
Families of Rodent ORs and V1Rs
Phylogenetic Analysis of OR/V1R Repertoire
OR and V1R genes can be divided into families based on a phylogenetic analysis. Members from the same family are defined as sharing 40% or higher amino acid identity and more than 50% bootstrap support. Similar to the mouse, rat ORs also comprise two broad classes, the fish-like Class I and the mammalian Class II. Each of these broad classes can be further separated into families: 153 Class I ORs comprise 27 families, and 1423 Class II ORs can be organized into 133 families. The V1Rs comprise 12 families.
We assigned families and names to rat ORs and V1Rs (see Supplementary table 1) based on earlier methods [4]. The nomenclature for rat ORs is in the format ROR[family number]-[index in the family]. Class I ORs have family numbers smaller than 100 and Class II ORs have family numbers higher than 100. The nomenclature for V1Rs is in the format RV1R[family]-[index in family]. For both ORs and V1Rs, the suffix ‘i’ was added to partial genes, and ‘p’ was added to pseudogenes (see Supplementary table 1).
In the mouse, the newly discovered ORs and V1Rs were analyzed for phylogenetic relatedness. Because sequences were updated in the new genome assembly, the classification of the OR and V1R repertoire was slightly different from the original ones based on mouse genome mm2 version[4; 10] (http://genome.ucsc.edu/). Our updated OR repertoire contains 1194 compared to 978 putative intact genes in the mm2 version. Our V1R repertoire contains 191 putative intact genes, in contrast to 134 in the mm2 version. This is because of the higher sequence quality and coverage in the new genome assembly, resulting in fewer errors in data mining. We classified the updated sequences into families, and mapped them to those in the old version (see Supplementary table 2). Since our main purpose in this paper is not to compare sequence differences in two assemblies, we will not discuss the details of them.
Species-specific Families of ORs/V1Rs
We sought to determine which families, if any, had evolved to be species specific in either rat or mouse. To highlight species-specific families, we performed a phylogenetic analysis with all ORs from mouse and rat pooled, and used the same criterion to define a family as applied to ORs from either species separately (see Methods). The total of all rodent ORs can be divided into 245 families, among which only 4 families include only rat ORs, and 12 families are specific for mouse. In total, 20 mouse ORs and 8 rat ORs are contained in these 16 species-specific families, but of these only 4 mouse ORs and 1 rat OR are intact genes. These analyses indicate that at the family level OR genes are closely related in mouse and rat since more than 99% of putatively functional ORs are clustered in families with counterparts from the other species.
Orthologous clusters were also identified, and presented in Fig 2a, where OR clusters and single genes from rat are displayed on the top, and from mouse along the bottom. As an example, rat cluster 3-68 is orthologous with mouse cluster 2-85, since the counterparts of the ORs from rat cluster 3-68 are always found in mouse cluster 2-85.
Fig.2. Comparison of rodent ORs and V1Rs through phylogenetic analysis.
At the family level, OR genes are more closely related in mouse and rat than V1R genes. All mouse and Rat ORs/V1Rs are pooled and classified into sub-families using the same criterion to define a family as applied to ORs/V1Rs from either species separately. OR/V1R subfamilies are represented in the middle of the figure by a line of small colored bars. ORs/V1Rs belonging to the same subfamily have the same color index. OR/V1R clusters are shown according to their genomic locations on the top and bottom with grey bars proportional to their width on the chromosome. Each OR/V1R gene is represented by one line inside the grey bar with its family color index. Isolated OR/V1R genes are drawn without grey cluster bars. Each cluster is named with the chromosome location and position(Mb) connected by an underscore. Thin grey lines from each cluster or isolated gene are connected to their subfamily in the middle. (a) Families of OR genes. To minimize the number of connection lines, only one line is drawn between the cluster bar and the subfamily bar if there is more than 1 OR from the same cluster belonging to the same subfamily. Orthologous clusters are highlighted with dark lines and Class I ORs are highlighted with blue lines. In total, mouse and rat ORs are divided into 245 families, among which 4 are specific for rat and 12 for mouse. In these rat or mouse specific families there are only 4 intact mouse ORs and 1 intact rat OR, which constitutes less than 1% of all putative functional ORs. Class I ORs, which are from rat cluster 1_160 and mouse cluster 7_90 are the most conserved in terms of genomic location. (b) Families of V1R genes. One grey line is draw from the relative position of each V1R gene in its genomic cluster to the subfamily that it belongs to. Species-specific families are highlighted with red (for rat) and green (for mouse) circles. 14 families are specific for rat, and 10 for mouse, in which there are 35 rat V1Rs and 79 mouse V1Rs, constituting about 20 percent of the functional V1R repertoire.
Additionally we found that many homologous regions were located around these orthologous clusters (data not shown). This could be an indication of segmental duplication across the two species, which might also account for the close relationship between the receptors. Alternatively, these regions could contain regulatory motifs as these non-coding regions appear not to have been subject to the expected random mutation.
V1R genes pooled from mouse and rat were subject to a similar phylogenetic analysis, but with quite different results, as shown in Fig 2b. Rat V1R clusters and single genes are displayed on the top, and mouse V1Rs on the bottom. The pooled rodent V1R genes can be divided into 53 families, among which 10 families are specific for rat, and 14 for mouse. In total, 79 mouse V1Rs and 35 rat V1Rs are contained in these 24 species-specific families. 42 mouse-specific V1Rs and 20 rat-specific V1Rs are putatively intact genes, which constitutes about 20 percent of the functional V1R repertoire. In particular, 68 V1Rs from mouse cluster 13-21 form 7 families in which only 2 rat V1Rs appear. Most of the rat-specific family members are from families c and g, and most of the mouse-specific family members are from families d and h [11].
The above analysis indicates that, in contrast to the OR genes, at the family level V1R genes tend to be specific to each species, though the mouse and rat diverged only a relatively short time ago. The higher species-specificity of V1R genes may be understandable in the context of the species-specific behaviors they are believed to mediate, while the ORs, with their primary role in detecting environmental odors, are common to both species.
Similarity Level of ORs and V1Rs
The above phylogenetic analysis did not determine how exactly similar ORs or V1Rs are between two different species relative to receptor genes in the same species. We performed similarity analysis through BLAST to identify the best hit of each OR or V1R gene from the other species, and within the same species using the percent identity at the protein level.
Comparison of Orthologous OR/V1Rs
In Fig 3a, each OR gene (left panel) and V1R gene (right panel) were placed in the bin corresponding to their similarity level with their cross species counterparts. The colors represent the family membership (according to Fig 2) for each OR or V1R gene. In the top bars rat genes were compared with respect to the mouse genes, while in the bottom panel, rat genes served as the reference. The percentage of OR/V1R genes at different similarity levels is shown in Table 3. From the left panel, about 69% of rat ORs and 73% of mouse ORs have orthologs with over 80 percent identity, while only about 15% of rat and mouse V1Rs have orthologs at the same identity level. Most prominently, V1R genes do not have any ortholog with over 90 percent identity. Among ORs, Class I ORs are even more conserved between mouse and rat than Class II ORs (see Table 3). These analyses provide additional evidence, now at the individual gene level, that OR genes are more conserved between mouse and rat than V1R genes.
Fig.3. Similarity analysis of OR/V1R genes.
Each gene is represented at the position of the similarity range with its most related sequence. A color index is assigned to each OR/V1R family defined from either species separately. Mouse or rat OR/V1R genes from the same family are in the same color. The height of each bar is proportional to the number of genes in that similarity range. OR genes appear on the left panels, and V1R genes on the right. (a) Comparison between the best orthologs from the other species. In each panel, genes from rat are on the top, and those from mouse are on the bottom. Over 70% of OR genes have orthologs with more than 80 percent identity, whereas less than 20% of V1Rs have orthologs in this range, indicating that OR genes are more conserved between mouse and rat than V1R genes. (b) Comparison between best paralogs from the same species. In the left panel, 52% of rat ORs and 52% of mouse ORs have a paralog with over 80 percent identity. In the right panel, 41% of rat V1Rs and 63% of mouse V1Rs have a paralog at the same level. From the left panel, OR genes are more conserved with their orthologs than with their paralogs. From the right, V1R genes are more similar with their paralogs than with their orthologs.
Table 3.
Percentage of OR/V1R genes at certain similarity level with their counterparts or neighbors
Gene type | Compared species | Percentage of genes at different identity levels (%) | ||||
---|---|---|---|---|---|---|
>=90% | 80-90% | 70-80% | 60-70% | <60% | ||
Class I OR | R vs M | 43.1 | 44.4 | 9.2 | 2.0 | 1.3 |
M vs R | 44.3 | 34.2 | 10.1 | 3.2 | 8.2 | |
R vs R | 22.2 | 36.6 | 10.5 | 16.3 | 14.3 | |
M vs M | 16.5 | 29.7 | 17.1 | 12.0 | 24.7 | |
| ||||||
Class II OR | R vs M | 22.6 | 43.6 | 18.8 | 7.4 | 7.6 |
M vs R | 25.5 | 46.8 | 17.6 | 6.4 | 3.8 | |
R vs R | 20.7 | 36.1 | 22.3 | 12.4 | 8.6 | |
M vs M | 22.5 | 30.3 | 23.2 | 13.4 | 10.6 | |
| ||||||
V1R | R vs M | 0.0 | 16.6 | 34.2 | 17.6 | 31.6 |
M vs R | 0.0 | 13.0 | 35.4 | 21.4 | 30.2 | |
R vs R | 12.8 | 27.3 | 30.0 | 10.2 | 19.7 | |
M vs M | 30.5 | 32.8 | 15.9 | 9.1 | 11.7 |
R: rat; M: mouse
Comparison of Paralogous OR/V1Rs
Using the same method, we analyzed the similarity between OR or V1R paralogs as shown in Fig 3b. On the top, rat OR/V1R genes are compared to their paralogs, and in the bottom, mouse OR/V1R genes are compared to their paralogs. The percentage of genes at each similarity level is shown in Table 3. On the left panel, 57% rat ORs and 52% mouse ORs have a paralog with over 80 percent identity. Note that these fractions are much lower than those found in the across species ortholog comparisons above. However, in the right panel, 41% rat V1Rs and 63% mouse V1Rs have a paralog at the same level, fractions that are much higher than the percentages found in across species ortholog comparisons. Class I ORs and Class II ORs do not show significant differences in terms of paralogous comparison, while mouse V1Rs are more conserved with their paralogs than rat V1Rs (see Table 3).
These similarity analyses show again that OR genes are more conserved across the two species than V1R genes. However, they also reveal that ORs in mouse and rat have a higher similarity with their orthologs than with their paralogs. As expected the opposite appears to be the case for the V1R genes. Measuring the number of nucleotide substitutions per codon for reciprocal mouse-rat orthologs we found that the rate was lower for the ORs than the V1Rs (ORs: 779 pairs, 0.241+-0.091; V1Rs: 45 pairs, 0.369+-0.078) further suggesting a difference in evolutionary rates between the two types of chemo-receptors.
Motif Analysis of ORs and V1Rs
To gain a deeper sense of differences at the level of individual residues we applied the pattern recognition algorithm MEME [18] to the complete, combined database of rodent ORs to generate an exhaustive set of motifs. MEME identifies recurring motifs of variable length and reports the number of the genes that carry that motif –what are termed “support sequences”. By this analysis we identified 100 OR motifs and 83 V1R motifs with the pooled OR and V1R sequences from both mouse and rat. The motifs are 7 to 20 residues long and supported by more than 5 genes. Detailed information for each motif is listed in the Supplementary Tables 3&4 (see Supplementary Information).
Motifs of ORs
For OR genes, the 100 motifs discovered from the pool of 2478 mouse and rat intact ORs were schematically plotted in Fig4a. They tend to occur in Transmembrane domains (TM) 4, and 5 and Extracellular loop (EC) 3. Twelve of the generated motifs are highly conserved in the complete database (i.e., are present in more than 90% of ORs). The ‘MAYDRYVAIC[KN]P’ motif , which was previously found to be the most conserved motif in mouse, also stands out as the most conserved from our analysis with rat ORs added. For each motif we identified the species in which the support ORs are located. Somewhat surprisingly, we found no motif that is present only in mouse or rat ORs, although the number of support sequences from each species may vary considerably. For example, motif85 (see the supplementary table 3), exists in 16 rat ORs, but is shared by 26 mouse ORs.
Fig. 4. Motif analysis of OR and V1R sequences.
Recurring motifs were identified from the pool of mouse and rat intact ORs/V1Rs with the MEME program. This was followed by checking the identitiy of the support sequences in each species. Each motif is represented by a bar proportional to its width, positioned using OR I7 and VN3 as the references for the TMs (trans membrane domains). The color index varying from white to black indicates the conservation of each motif, measured as the number of support sequences. The motif numbers are listed on the top of each bar. Further details are shown in supplementary tables 3&4. Red and green circles highlight motifs that are specific for one species. (a) Motifs of OR genes. 100 motifs supported by more than 10 sequences are listed, none of which are specific for either mouse or rat. 10 motifs are specific for Class I ORs, and 59 are specific for Class II. The rest are shared by both classes. (b) Motifs of V1R genes. 83 motifs supported by more than 5 V1R sequences are listed. 12 of them reside only in mouse V1Rs and 3 of them reside only in rat V1Rs. The rest are shared by both species.
As mentioned above, ORs can be classified into two classes, Class I (fish-like), and Class II (mammalian-like) and then further organized into phylogenetic families. We also studied the support sequences for each motif to indicate its specificity for the Class I/II subgroup and for the family subgroup. Among the 100 motifs, 31 are shared by both classes, whereas 10 motifs are specific to Class I ORs and 59 motifs are specific to Class II ORs. In the family subgroups, 75 motifs are carried by ORs from multiple families. Among the remaining 25 motifs that are specific for one family, there is only one motif specific for Class I, while the rest of the 24 are specific for Class II ORs.
Some of the most highly conserved and widely supported motifs are also well recognized in other GPCRs. For example, the four cysteines, occurring in EC3 and TM3, are thought to form two disulfide bonds [19], as they do in rhodopsins and amine receptors [20]. Aspartic acid, and asparagines, occurring in motif 4 and 2 respectively, are thought to interact with each other to stabilize the overall structure as they do in bovine rhodopsins [21].
Motifs of V1Rs
For V1Rs, we discovered 83 motifs from the pool of 306 intact V1R genes from mouse and rat, using the same method as we did for ORs. As shown in Fig. 4b and supplementary table 4 , and clearly distinct from ORs, the V1R motifs are distributed in a flat pattern, mainly because variable motifs are scattered from N-terminal to C-terminal, except in TM1, TM3 and TM5. Among the 83 motifs, 6 of them are highly conserved in the rodent V1R database (present in over 90% of V1Rs). These motifs display no sequence similarity with the OR motifs. Of the 83 motifs, 12 are specific for mouse and 3 are specific for rat. The remainder are shared by both species, although again the distribution of support sequences is different. For example, motif 14 is supported by 63 V1Rs, 55 of which are mouse V1Rs and 8 are rat V1Rs. There are fewer specific motifs in rat, probably because the rat genome only has half as many intact V1Rs as the mouse. As distinct from the ORs, the 15 motifs represented only by either mouse (12) or rat (3) again indicate that V1R genes are more species-specific than OR genes, now at this motif level of analysis.
We also investigated the family composition of the support sequences for each of the V1R motifs. 43 out of the 83 motifs are from multiple families, and the remaining 40 motifs are supported by one family without regard to the species. Compared to ORs, V1Rs have more motifs supported by single families. This indicates that V1Rs have more locally-conserved motifs than ORs.
Summary
We have performed a systematic comparative analysis of two large gene families comprising the chemosensory receptors from two species, rat and mouse, that diverged relatively recently, about 24 million years ago. The two superfamilies are expressed in two independent chemosensory systems – the main olfactory system with its odor receptors (ORs) and the vomeronasal system with its V1R receptors. The OR system is thought to be primarily concerned with detecting environmental odors (e.g., food, predators) while the VR system is involved with conspecific recognition (e.g., mates, competitors).
The main finding of our comparative analysis is that the OR genes are substantially more similar between the two species than the V1R genes. Stated another way, the V1R gene repertoire appears more species-specific than the OR repertoire. This result appears in three levels of analysis – phylogenetic relatedness, sequence similarity and shared motifs. Although ORs represent a quite diverse family of genes within the same species, they are more conserved between these two species. Precisely the opposite tendency is seen for the V1Rs genes, which mostly diverge between the two species, sharing less relatedness, lower similarity and fewer motifs.
From our analysis, we found that both OR and V1R genes tend to form compact clusters with non-uniform distribution on the chromosomes. However, OR and V1R genes do not intermingle. In addition, phylogenetically close OR / V1R genes are always in close proximity with one another, suggesting that the generation of OR/V1R genes is not only due to large-scale segmental duplications, but also to local duplications. There are only a few isolated OR/V1R genes and most of them are pseudogenes. Therefore the OR genes appear to have undergone a slower evolution since the divergence of the two species, while the V1Rs, presumably due to their role in reproductive behaviors, have more rapidly evolved in the separate species.
Methods
Data mining
An exhaustive TBLASTN search incorporating profile HMM (hidden Markov Model) search was used to obtain all the possible rat OR/V1R sequences from the rn3 assembly, but a less sensitive BLAT search was used in place of TBLASTN to update mouse OR/V1R sequences from the mm5 assembly of mouse genome. All genome sequences were downloaded from USCS (http://genome.ucsc.edu/). Conceptual translation was used to recover the original ORFs for possible pseudogenes. The profile HMM from the HMMER package was used to calculate the p-value for each OR/V1R candidate. Duplicates were removed, and the resulting genes were subject to following analyses.
Phylogenetic Analysis
The protein sequences encoded by the OR or V1R genes were aligned using Clustalx 1.81. The resulting multiple alignment were used as input to PAUP* 4.0 beta 10 (Sinauer Associates, Sunderland, Massachusetts) and the majority-rule consensus neighbor-joining (NJ) tree was obtained from PAUP*. OR/V1R gene families were determined from the tree as the largest clades that fulfilled two criteria: the clade had >50% bootstrap support, and all members within the clade had at least 40% protein identity.
Similarity Comparison
BLAST program was run locally to identify the best hit of each sequence using the default parameters. Investigator-developed programs were used to do other analysis. These programs were Perl scripts, and available upon request to the author.
Motif Discovery
Only intact, full-length OR/V1R sequences were used for motif discovery. MEME programs were downloaded from http://meme.sdsc.edu/meme/website/meme-download.html and installed on a local server. The MEME program produced conserved motifs from each OR/V1R database, with width between 5 and 20 amino acids. E-value, along with the support sequences, was used as the criterion to select useful motifs, which have e-value lower than 1e-10 and no less than 5 support sequences. Perl scripts were developed to extract the gene names containing each motif and analyze whether it is shared by both mouse and rat or specific for one of them.
Supplementary Material
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errorsmaybe discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- 1.Feinstein P, Bozza T, Rodriguez I, Vassalli A, Mombaerts P. Axon guidance of mouse olfactory sensory neurons by odorant receptors and the beta2 adrenergic receptor. Cell. 2004;117:833–46. doi: 10.1016/j.cell.2004.05.013. [DOI] [PubMed] [Google Scholar]
- 2.Buck L, Axel R. A novel multigene family may encode odorant receptors: a molecular basis for odor recognition. Cell. 1991;65:175–87. doi: 10.1016/0092-8674(91)90418-x. [DOI] [PubMed] [Google Scholar]
- 3.Mombaerts P. Seven-transmembrane proteins as odorant and chemosensory receptors. Science. 1999;286:707–11. doi: 10.1126/science.286.5440.707. [DOI] [PubMed] [Google Scholar]
- 4.Zhang X, Firestein S. The olfactory receptor gene superfamily of the mouse. Nat Neurosci. 2002;5:124–33. doi: 10.1038/nn800. [DOI] [PubMed] [Google Scholar]
- 5.Young JM, Trask BJ. The sense of smell: genomics of vertebrate odorant receptors. Hum Mol Genet. 2002;11:1153–60. doi: 10.1093/hmg/11.10.1153. [DOI] [PubMed] [Google Scholar]
- 6.Glusman G, Yanai I, Rubin I, Lancet D. The complete human olfactory subgenome. Genome Res. 2001;11:685–702. doi: 10.1101/gr.171001. [DOI] [PubMed] [Google Scholar]
- 7.Olender T, Fuchs T, Linhart C, Shamir R, Adams M, Kalush F, Khen M, Lancet D. The canine olfactory subgenome. Genomics. 2004;83:361–72. doi: 10.1016/j.ygeno.2003.08.009. [DOI] [PubMed] [Google Scholar]
- 8.Zhang X, Rogers M, Tian H, Zou DJ, Liu J, Ma M, Shepherd GM, Firestein SJ. High-throughput microarray detection of olfactory receptor gene expression in the mouse. Proc Natl Acad Sci U S A. 2004;101:14168–14173. doi: 10.1073/pnas.0405350101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Young JM, Shykind BM, Lane RP, Tonnes-Priddy L, Ross JA, Walker M, Williams EM, Trask BJ. Odorant receptor expressed sequence tags demonstrate olfactory expression of over 400 genes, extensive alternate splicing and unequal expression levels. Genome Biol. 2003;4:R71. doi: 10.1186/gb-2003-4-11-r71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Zhang X, Rodriguez I, Mombaerts P, Firestein S. Odorant and vomeronasal receptor genes in two mouse genome assemblies. Genomics. 2004;83:802–11. doi: 10.1016/j.ygeno.2003.10.009. [DOI] [PubMed] [Google Scholar]
- 11.Rodriguez I, Del Punta K, Rothman A, Ishii T, Mombaerts P. Multiple new and isolated families within the mouse superfamily of V1r vomeronasal receptors. Nat Neurosci. 2002;5:134–40. doi: 10.1038/nn795. [DOI] [PubMed] [Google Scholar]
- 12.Young JM, Kambere M, Trask BJ, Lane RP. Divergent V1R repertoires in five species: Amplification in rodents, decimation in primates, and a surprisingly small repertoire in dogs. Genome Res. 2005;15:231–40. doi: 10.1101/gr.3339905. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Del Punta K, Leinders-Zufall T, Rodriguez I, Jukam D, Wysocki CJ, Ogawa S, Zufall F, Mombaerts P. Deficient pheromone responses in mice lacking a cluster of vomeronasal receptor genes. Nature. 2002;419:70–4. doi: 10.1038/nature00955. [DOI] [PubMed] [Google Scholar]
- 14.Rodriguez I. Pheromone receptors in mammals. Horm Behav. 2004;46:219–30. doi: 10.1016/j.yhbeh.2004.03.014. [DOI] [PubMed] [Google Scholar]
- 15.Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J, Barlow K, Beck S, Berry E, Birren B, Bloom T, Bork P, Botcherby M, Bray N, Brent MR, Brown DG, Brown SD, Bult C, Burton J, Butler J, Campbell RD, Carninci P, Cawley S, Chiaromonte F, Chinwalla AT, Church DM, Clamp M, Clee C, Collins FS, Cook LL, Copley RR, Coulson A, Couronne O, Cuff J, Curwen V, Cutts T, Daly M, David R, Davies J, Delehaunty KD, Deri J, Dermitzakis ET, Dewey C, Dickens NJ, Diekhans M, Dodge S, Dubchak I, Dunn DM, Eddy SR, Elnitski L, Emes RD, Eswara P, Eyras E, Felsenfeld A, Fewell GA, Flicek P, Foley K, Frankel WN, Fulton LA, Fulton RS, Furey TS, Gage D, Gibbs RA, Glusman G, Gnerre S, Goldman N, Goodstadt L, Grafham D, Graves TA, Green ED, Gregory S, Guigo R, Guyer M, Hardison RC, Haussler D, Hayashizaki Y, Hillier LW, Hinrichs A, Hlavina W, Holzer T, Hsu F, Hua A, Hubbard T, Hunt A, Jackson I, Jaffe DB, Johnson LS, Jones M, Jones TA, Joy A, Kamal M, Karlsson EK, et al. Initial sequencing and comparative analysis of the mouse genome. Nature. 2002;420:520–62. doi: 10.1038/nature01262. [DOI] [PubMed] [Google Scholar]
- 16.Gibbs RA, Weinstock GM, Metzker ML, Muzny DM, Sodergren EJ, Scherer S, Scott G, Steffen D, Worley KC, Burch PE, Okwuonu G, Hines S, Lewis L, DeRamo C, Delgado O, Dugan-Rocha S, Miner G, Morgan M, Hawes A, Gill R, Celera, Holt RA, Adams MD, Amanatides PG, Baden-Tillson H, Barnstead M, Chin S, Evans CA, Ferriera S, Fosler C, Glodek A, Gu Z, Jennings D, Kraft CL, Nguyen T, Pfannkoch CM, Sitter C, Sutton GG, Venter JC, Woodage T, Smith D, Lee HM, Gustafson E, Cahill P, Kana A, Doucette-Stamm L, Weinstock K, Fechtel K, Weiss RB, Dunn DM, Green ED, Blakesley RW, Bouffard GG, De Jong PJ, Osoegawa K, Zhu B, Marra M, Schein J, Bosdet I, Fjell C, Jones S, Krzywinski M, Mathewson C, Siddiqui A, Wye N, McPherson J, Zhao S, Fraser CM, Shetty J, Shatsman S, Geer K, Chen Y, Abramzon S, Nierman WC, Havlak PH, Chen R, Durbin KJ, Egan A, Ren Y, Song XZ, Li B, Liu Y, Qin X, Cawley S, Cooney AJ, D’Souza LM, Martin K, Wu JQ, Gonzalez-Garay ML, Jackson AR, Kalafus KJ, McLeod MP, Milosavljevic A, Virk D, Volkov A, Wheeler DA, Zhang Z, Bailey JA, Eichler EE, Tuzun E, et al. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature. 2004;428:493–521. doi: 10.1038/nature02426. [DOI] [PubMed] [Google Scholar]
- 17.Adkins RM, Gelke EL, Rowe D, Honeycutt RL. Molecular phylogeny and divergence time estimates for major rodent groups: evidence from multiple genes. Mol Biol Evol. 2001;18:777–91. doi: 10.1093/oxfordjournals.molbev.a003860. [DOI] [PubMed] [Google Scholar]
- 18.Bailey TL, Elkan C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol. 1994;2:28–36. [PubMed] [Google Scholar]
- 19.Zozulya S, Echeverri F, Nguyen T. The human olfactory receptor repertoire. Genome Biol. 2001;2:RESEARCH0018. doi: 10.1186/gb-2001-2-6-research0018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Ballesteros JA, Shi L, Javitch JA. Structural mimicry in G protein-coupled receptors: implications of the high-resolution structure of rhodopsin for structure function analysis of rhodopsin-like receptors. Mol Pharmacol. 2001;60:1–19. [PubMed] [Google Scholar]
- 21.Palczewski K, Kumasaka T, Hori T, Behnke CA, Motoshima H, Fox BA, Le Trong I, Teller DC, Okada T, Stenkamp RE, Yamamoto M, Miyano M. Crystal structure of rhodopsin: A G protein-coupled receptor. Science. 2000;289:739–45. doi: 10.1126/science.289.5480.739. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.