- About this Journal ·
- Abstracting and Indexing ·
- Advance Access ·
- Aims and Scope ·
- Article Processing Charges ·
- Articles in Press ·
- Author Guidelines ·
- Bibliographic Information ·
- Contact Information ·
- Editorial Board ·
- Editorial Workflow ·
- Free eTOC Alerts ·
- Publication Ethics ·
- Reviewers Acknowledgment ·
- Submit a Manuscript ·
- Subscription Information ·
- Table of Contents
Journal of Viruses
Volume 2014 (2014), Article ID 329049, 7 pages
Natural Selection Determines Synonymous Codon Usage Patterns of Neuraminidase (NA) Gene of the Different Subtypes of Influenza A Virus in Canada
1Department of Zoology, University of British Columbia, Vancouver, Canada V6T 1Z4
2Department of Renewable Resources, University of Alberta, Edmonton, Canada T6G 2H1
Received 15 April 2013; Revised 1 July 2013; Accepted 25 August 2013; Published 2 January 2014
Academic Editor: Sílvia Bofill-Mas
Copyright © 2014 Youhua Chen. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Synonymous codon usage patterns of neuraminidase (NA) gene of 64 subtypes (one is a mixed subtype) of influenza A virus found in Canada were analyzed. In total, 1422 NA sequences were analyzed. Among the subtypes, H1N1 is the prevailing one with 516 NCBI accession records, followed by H3N2, H3N8, and H4N6. The year of 2009 has the highest report records for the NA sequences in Canada, corresponding to the 2009 pandemic event. Correspondence analysis on the RSCU values of the four major subtypes showed that they had distinct clustering patterns in the two-dimensional scatter plot, indicating that different subtypes of IAV utilized different preferential codons. This subtype clustering pattern implied the important influence of natural selection, which could be further evidenced by an extremely flattened regression line in the neutrality plot (GC12 versus G3s plot) and a significant phylogenetic signal on the distribution of different subtypes in the clades of the phylogenetic tree ( statistic). In conclusion, different subtypes of IAV showed an evolutionary differentiation on choosing different optimal codons. Natural selection played a deterministic role to structure IAV codon usage patterns in Canada.
Codon usage is not a random event . Codon usage bias has been broadly observed, and different mechanisms have been proposed to explain the bias patterns, for example, mutation pressure, translational efficiency, gene length , dinucleotide bias , tRNA abundance , organ specificity , and so on. Codon usage bias patterns have been broadly studied in recent years, especially for virus genomes [3, 6, 7].
In recent years, codon usage patterns have been widely explored for influenza viruses [6, 8–12]. Among the three influenza viruses, influenza A virus (IAV) is the major concern since it has a lot of subtypes. IAV is a genus of Orthomyxoviridae family of viruses, which caused influenza in birds and mammals [6, 13]. Among the eight RNA segments of IAV, hemagglutinin and neuraminidase (NA) genes are the principal concerns.
Currently, most of modeling efforts on IAV are focused on Asian regions [6, 8]; little attention is paid on the evolutionary patterns of IAV in local areas of North America. To fill such a knowledge gap, in the present study, I analyzed all the available 1436 NA ORFs for IAV found in Canada to reveal the codon usage patterns of IAV different subtypes in Canada.
2. Materials and Methods
2.1. Sequence Data
1436 NA sequences found in IAV strains of Canada were extracted from NCBI GenBank database (http://www.ncbi.nlm.nih.gov/). Only the open reading frames (ORFs) are considered for codon usage bias analysis. The alignment of the large number of NA sequences is done using MAFFT software [14, 15]. The dataset and the alignment will be available at the online digital repository website: http://datadryad.org/.
2.2. Measure of Codon Usage Patterns
Relative synonymous codon usage values of each codon in a gene are calculated to investigate the characteristics of synonymous codon usage. The RSCU index is calculated as follows : where is the observed number of the th codon for the jth amino acid which has kinds of synonymous codons. Higher (or lower) RSCU values, higher (or lower) frequencies of the codons being chosen. When the corresponding RSCU values of a codon are close to 1, it is used randomly and evenly.
2.3. Effective Number of Codons
The effective number of codons (ENCs) is a measure of bias from equal codon usage in a gene . The calculation formula is where (k = 2, 3, 4, 6) is the mean of values for the k-fold degenerate amino acids, which is estimated using the formula as follows: where is the total number of occurrences of the codons for that amino acid and where is the total number of occurrences of the th codon for that amino acid.
ENC ranges from 20 for the strongest bias (where only one codon is used for each amino acid) to 61 for no bias (where all synonymous codons are used equally).
For revealing the relationship between GC3s and ENC values, the expected ENC values for different GC3s were calculated as follows: where denotes the value of GC3s . These expected ENC values can form a unimodal curve. As such, the observed and expected ENC values can be compared to determine the influence of nucleotide compositional constraint on structuring synonymous codon usage bias. If the observed data points are on or nearby the null expected ENC curve, then compositional constraint plays the major role in structuring codon usage patterns. If the observed points are fallen below the null curve, then the role of natural selection emerges.
2.4. Correspondence Analysis and Correlation Analysis
Correspondence analysis (COA) was conducted to investigate the major trend involved in the codon usage patterns of the genomes from different virus strains, which were measured by corresponding RSCU values for the 59 codons (after excluding Met, Trp, and three termination codons) . COA is performed using “ca” package  under R environment .
The first two axes of COA (CA1 and CA2) were then subjected to correlation analysis. A Spearman’s rank correlation analysis was used to infer the relationships between different codon usage indices and the first two axes (CA1 and CA2) of IAV RSCU values.
2.5. Phylogenetic Signals of Different Subtypes of IAV for NA Gene
Phylogenetic signals revealed the clustering and overdispersion (or randomness) of the different subtypes in the phylogenetic tree. If different subtypes tend to group in the same clade, then the clustering pattern could be found and the phylogenetic signal should be detected. In contrast, if different subtypes tend to disperse across different clades, then the overdispersion pattern could be found and the phylogenetic signal should be minor. I implemented two metrics for testing phylogenetic signals for the major subtypes of IAV found in Canada: Blomberg’s K statistic  and Pagel’s statistic [21, 22]. Both tests were implemented in R  package “phytools” .
For inferring phylogenetic signals, a phylogenetic tree is required. Given that we have over 1400 NA sequences, we used the fastest software, FastTree , to construct a maximum likelihood (ML) tree, which is based on the sequences derived from the major subtypes (H1N1, H3N2, H3N8, and H4N6). The reconstructed ML tree will be available from the online digital repository website: http://datadryad.org/. The multichotomies of the tree are resolved using R  package “ape” , and the dating of divergence time is done using the mean path length method . Negative branch lengths have occurred after molecular dating but do influence the inference of phylogenetic signal test.
3. Results and Discussion
3.1. Count and Temporal Distribution of Subtypes of Influenza A Viruses in Canada
As shown in Figure 1(a), the most prevailing subtype of IAV in Canada is H1N1 (with 516 records), followed by H3N2 (with 132 records), H3N8 (with 152 records), and H4N6 (with 113 records).
Based on strain records (Figure 1(b)), it was found that the year of 2009 has the highest number of IAV NA genes being sequenced with a number of 531, followed by the year of 2007 (with a number of 204) and the year of 1968 (66 records).
3.2. ENC-GC3s and Neutrality Plots
As shown in Figure 2, most points lay below the null curve considerably, while some points from H1N1 were very far from the null curve, indicating the importance of selection.
Neutrality plot of Figure 3(a) showed that natural selection should play a very crucial role in structuring the overall codon usage patterns across different subtypes of IAV in Canada. The relative influence of mutation pressure is very small on the first and second codon positions (GC12) with respect to that on the third codon position (GC3), as indicated by the slope of the regression line (0.0158). Such a conclusion is enforced since this regression slope is not significant (), which suggests that there is no clear trend between GC12 and GC3 across different subtypes. Such a decoupling is argued to be structured by natural selection [27, 28].
Furthermore, when the neutrality plots are applied to each of the major subtypes (Figures 3(b), 3(c), 3(e), and 3(f)), most of them are found to have near-zero or negative regression slopes (indicated by correlation coefficients and associated values). As such, natural selection is exclusively important to structure codon usage pattern of NA gene within a single subtype and between different subtypes of IAV in Canada.
3.3. Clustering Patterns of Codon Usage Patterns of Major Subtypes in Canada
As shown in the COA plot of the first two axes in Figure 4, apparently there are distinct clustering patterns among the first four subtypes of IAV. However, the mixed subtype group showed a dispersing pattern, which could be inferred that these mixed sequences are principally generated from subtypes H3N8 and H4N6 because most mixed sequences were well overlapped with the positions of these two known subtype groups.
With respect to the codon usage bias patterns of the four major subtypes (H1N1, H3N2, H3N8, and H4N6), it is found that they did show distinct preferences on choosing optimal codons (Table 1). For subtype H1N1, the most preferred codons (standard: RSCU > 1.2 being the highest in that subtype) were GCU, CUA, UUA, AGA, and AGU in comparison to other subtypes. For subtype H3N2, the most preferred codons were CAU, CCU, AGC, and UAU, respectively. For subtype H3N8, the most preferred codons were GCA, AUU, AAA, AAU, AGG, and ACU. For subtype H4N6, the most important codons were GAA, UUA, GGA, AUA, UUG, CCA, UCA, GUA, and GUG. Apparently, different subtypes tended to congruently use A-end codons in most cases. Such an observation is consistent with many previous studies working on specific subtypes of IAV [6, 8]. Interestingly, no most preferred codons were found for mixed subtype group (Table 1). This is completely consistent to the clustering patterns in Figure 4.
3.4. The Determinants of Codon Usage Patterns of Different Subtypes of IAV in Canada
As shown in the correlation between different codon usage indices and the first two axes of COA (Table 2), it is evidenced that the C3s and G3s individually are the principal drivers for determining the first major trend CA1 of codon usage bias patterns of IAV in Canada. Interestingly, GC3s, the combined G3s and C3s, is not a strong determinant of the first major trend CA1. Comparatively, A3s and A contents are the major predictors of the second major trend of codon usage patterns of different subtypes of IAV.
3.5. Phylogenetic Clustering Patterns of Different Subtypes of IAV in Canada
Blomberg’s K statistic failed to return a result because the phylogenetic covariance matrix based on the inferred ML tree is singular when the inverse of the matrix is taken. However, Pagel’s statistic could be implemented. The randomization test (1000 runs) on the statistic for testing phylogenetic signals showed a significant result (, log-likelihood = −1480.885, and ). The significant phylogenetic structure implied that NA sequences from the same subtype would tend to group in the same clade in the phylogenetic tree.
3.6. The Debate of Mutation versus Selection on IAV
Natural selection played a dominant role in structuring different subtypes of NA genes for IAV in Canada. The significant clustering patterns supported by correspondence analysis on the RSCU values and the phylogenetic signaling randomization test clearly support the evolutionary differentiation of NA genes of different subtypes of IAV in Canada. Neutrality plot further evidenced the dominance of natural selection on NA gene in Canada.
Our study, therefore, is different from previous studies on codon usage patterns of IAV because the previous studies showed that mutation selection should be prevailing [6, 8, 9]. Therefore, it is imperative to compare different subtypes found in different regions (either regional or continental levels) to better quantify the influence of environmental selection and mutational pressure on the evolution of virus genes.
Because different subtypes of IAV have utilized different optimal codons, flattened regression lines are always found in the neutrality plots, and phylogenetic signal is significant for the distribution of subtypes over the phylogenetic tree. It is concluded that different subtypes of IAV in Canada show evolutionary differentiation patterns, and natural selection plays a central role in structuring codon usage patterns for IAV in the region.
Conflict of Interests
There is no conflict of interests regarding the materials of the study.
This work is partially supported by a China Scholarship Council (CSC) graduate studentship.
- M. Archetti, “Codon usage bias and mutation constraints reduce the level of error minimization of the genetic code,” Journal of Molecular Evolution, vol. 59, no. 2, pp. 258–266, 2004.
- L. Duret and D. Mouchiroud, “Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis,” Proceedings of the National Academy of Sciences of the United States of America, vol. 96, no. 8, pp. 4482–4487, 1999.
- P. Tao, L. Dai, M. Luo, F. Tang, P. Tien, and Z. Pan, “Analysis of synonymous codon usage in classical swine fever virus,” Virus Genes, vol. 38, no. 1, pp. 104–112, 2009.
- E. N. Moriyama and J. R. Powell, “Codon usage bias and tRNA abundance in Drosophila,” Journal of Molecular Evolution, vol. 45, no. 5, pp. 514–523, 1997.
- G. P. Holmquist and J. Filipski, “Organization of mutations along the genome: a prime determinant of genome evolution,” Trends in Ecology and Evolution, vol. 9, no. 2, pp. 65–69, 1994.
- X. Liu, C. Wu, and A. Y.-H. Chen, “Codon usage bias and recombination events for neuraminidase and hemagglutinin genes in Chinese isolates of influenza A virus subtype H9N2,” Archives of Virology, vol. 155, no. 5, pp. 685–693, 2010.
- Y. Zhang, Y. Liu, W. Liu et al., “Analysis of synonymous codon usage in Hepatitis A virus,” Virology Journal, vol. 8, p. 174, 2011.
- T. Zhou, W. Gu, J. Ma, X. Sun, and Z. Lu, “Analysis of synonymous codon usage in H5N1 virus and other influenza A viruses,” BioSystems, vol. 81, no. 1, pp. 77–86, 2005.
- E. H. M. Wong, D. K. Smith, R. Rabadan, M. Peiris, and L. L. M. Poon, “Codon usage bias and the evolution of influenza A viruses. Codon Usage Biases of Influenza Virus,” BMC Evolutionary Biology, vol. 10, no. 1, article 253, 2010.
- S. Kryazhimskiy, G. A. Bazykin, and J. Dushoff, “Natural selection for nucleotide usage at synonymous and nonsynonymous sites in influenza A virus genes,” Journal of Virology, vol. 82, no. 10, pp. 4938–4945, 2008.
- X. Gong, S. Fan, Z. Cui, and X. Li, “The CpG suppression of polymerase segments and its impact on codon usage bias in H1N1 influenza virus,” Acta Biophysica Sinica, vol. 27, pp. 537–544, 2011.
- N. Goni, A. Iriarte, V. Comas et al., “Pandemic influenza A virus codon usage revisited: biases, adaptation and implications for vaccine strain development,” Virology Journal, vol. 9, p. 263, 2012.
- K. Fancher and W. Hu, “Codon bias of influenza a viruses and their hosts,” American Journal of Molecular Biology, vol. 1, pp. 174–182, 2011.
- S. Katoh, “MAFFT multiple sequence alignment software version 7: improvements in performance and usability,” Molecular Biology and Evolution, vol. 30, no. 4, pp. 772–780, 2013.
- K. Katoh, K. Misawa, K.-I. Kuma, and T. Miyata, “MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform,” Nucleic Acids Research, vol. 30, no. 14, pp. 3059–3066, 2002.
- P. M. Sharp and W.-H. Li, “Codon usage in regulatory genes in Escherichia coli does not reflect selection for “rare” codons,” Nucleic Acids Research, vol. 14, no. 19, pp. 7737–7749, 1986.
- F. Wright, “The “effective number of codons” used in a gene,” Gene, vol. 87, pp. 23–29, 1990.
- M. Greenacre and O. Nenadic, “Ca: a package for computation and visualization of simple, multiple and joint correspondence analysis,” 2012, http://www.carme-n.org/.
- R Development Core Team, R: A Language and Environment For Statistical Computing, Vienna, Austria, R Foundation for Statistical Computing, Vienna, Austria, 2011, http://www.R-project.org.
- S. P. Blomberg, T. Garland Jr., and A. R. Ives, “Testing for phylogenetic signal in comparative data: behavioral traits are more labile,” Evolution, vol. 57, no. 4, pp. 717–745, 2003.
- R. P. Freckleton, P. H. Harvey, and M. Pagel, “Phylogenetic analysis and comparative data: a test and review of evidence,” American Naturalist, vol. 160, no. 6, pp. 712–726, 2002.
- M. Pagel, “Inferring the historical patterns of biological evolution,” Nature, vol. 401, no. 6756, pp. 877–884, 1999.
- L. Revell, “phytools: an R package for phylogenetic comparative biology (and other things),” Methods in Ecology and Evolution, vol. 3, pp. 217–223, 2012.
- M. N. Price, P. S. Dehal, and A. P. Arkin, “FastTree 2—approximately maximum-likelihood trees for large alignments,” PLoS One, vol. 5, no. 3, Article ID e9490, 2010.
- E. Paradis, J. Claude, and K. Strimmer, “APE: analyses of phylogenetics and evolution in R language,” Bioinformatics, vol. 20, no. 2, pp. 289–290, 2004.
- T. Britton, B. Oxelman, A. Vinnersten, and K. Bremer, “Phylogenetic dating with confidence intervals using mean path lengths,” Molecular Phylogenetics and Evolution, vol. 24, no. 1, pp. 58–65, 2002.
- Q. Liu and Q. Xue, “Comparative studies on codon usage pattern of chloroplasts and their host nuclear genes in four plant species,” Journal of Genetics, vol. 84, no. 1, pp. 55–62, 2005.
- A. Kawabe and N. T. Miyashita, “Patterns of codon usage bias in three dicot and four monocot plant species,” Genes and Genetic Systems, vol. 78, no. 5, pp. 343–352, 2003.