Analysis of Synonymous Codon Usage Bias in Flaviviridae Virus
Background. Flaviviridae viruses are single-stranded, positive-sense RNA viruses, which threat human constantly mediated by mosquitoes, ticks, and sandflies. Considering the recent increase in the prevalence of the family virus and its risk potential, we investigated the codon usage pattern to understand its evolutionary processes and provide some useful data to develop the medications for most of Flaviviridae viruses. Results. The overall extent of codon usage bias in 65 Flaviviridae viruses is low with the average value of GC contents being 50.5% and the highest value being 55.9%; the lowest value is 40.2%. ENC values of Flaviviridae virus genes vary from 48.75 to 57.83 with a mean value of 55.56. U- and A-ended codons are preferred in the Flaviviridae virus. Correlation analysis shows that the positive correlation between ENC value and GC content at the third nucleotide positions was significant in this family virus. The result of analysis of ENC, neutrality plot analysis, and correlation analysis revealed that codon usage bias of all the viruses was affected mainly by natural selection. Meanwhile, according to correspondence analysis (CoA) based on RSCU and phylogenetic analysis, the Flaviviridae viruses mainly are made up of two groups, Group I (Yellow fever virus, Apoi virus, Tembusu virus, Dengue virus 1, and others) and Group II (West Nile virus lineage 2, Japanese encephalitis virus, Usutu virus, Kedougou virus, and others). Conclusions. All in, the bias of codon usage pattern is affected not only by compositional constraints but also by natural selection. Phylogenetic analysis also illustrates that codon usage bias of virus can serve as an effective means of evolutionary classification in Flaviviridae virus.
All amino acids, except for methionine (Met) and tryptophan (Trp), are coded by more than one synonymous codon in the organism. The phenomenon that alternative synonymous codons do not occur equally is referred to as codon usage bias and this is a process of long-term accumulation. As an important evolutionary phenomenon, it is well known that synonymous codon usage bias exists in a wide range of species from prokaryotes to eukaryotes . Compositional constraints and natural selection are thought to be two main factors influencing codon usage variation among the gene in different organisms [2, 3]. Flaviviridae viruses are single-stranded, positive-sense RNA viruses, which threat human constantly mediated by mosquitoes, ticks, and sandflies, such as Zika virus, Dengue virus, Yellow fever virus, Japanese encephalitis virus, and other viruses. Because their hosts are from the vertebrates and invertebrate, most of Flaviviridae viruses are related to some human diseases. For example, Dengue virus, Japanese encephalitis virus, and Zika virus are mediated by mosquitoes. Dengue virus contains four serotypes (DENV1 to DENV4) and its infection may cause symptoms from mild dengue fever to dengue hemorrhagic fever, even dengue shock syndrome  and stabilizing selection acts on the codon usage bias . Spread of the Japanese encephalitis virus, reported from WHO, produced a total of 27, 059 patients during 2006~2009, out of which 86% were from China and India, 20~30% were caused to be fatal and 30~50% of the survivors were found to cause serious postinfection neurological sequelae and Japanese encephalitis virus has low codon usages bias influenced by both mutational pressure and natural selection . Zika virus producing a number of microcephaly in Brazil is rapidly spreading to other parts of the world since 2015. Zika coding sequences have relatively conserved and genotype-specific evolution of codon usage bias . Powassan virus, yellow fever virus, and spondweni virus are mediated by ticks. Powassan virus is a fatal, neurotropic virus, with a 671% rise in cases in the last 18 years, which has become an emerging danger worldwide . Yellow fever virus can cause yellow fever which is endemic in many African and South American countries . Spondweni virus can cause a self-limiting febrile illness characterized by headache, myalgia, nausea, and arthralgia similar to Zika virus infections . Codon usage patterns of some members from the Flaviviridae viruses have been studied, such as Zika virus  and Dengue virus . But the population codon usage characteristics of all Flaviviridae viruses have not been reported by now. Considering the recent increase in the prevalence of the family virus and its risk potential, we investigated the codon usage pattern to understand its evolutionary processes and provide some useful data to develop the medications for Flaviviridae viruses.
2. Materials and Methods
2.1. Genetic Material
The complete sequences of 65 Flaviviridae viruses were downloaded from NCBI (http://www.ncbi.nlm.nih.gov) and the detailed information about the viruses is listed in Table 1. The ORFs of the viruses were identified by DNAStar.
2.2. Nucleotide Composition Analysis
The following compositional properties were calculated for the coding sequences of the Flaviviridae virus genomes: (i) overall GC content; (ii) overall frequency of nucleotides (A%, C%, U%, and G%); (iii) frequency of each nucleotide at the third site of the synonymous codons (%, %, %, and %); (iv) frequency of nucleotides G + C at the third synonymous codon positions (%); (v) frequency of nucleotides G + C at the third codon positions (GC3) and the mean of the frequency of both G + C at the first and second position (GC12). The codons AUG and UGG are the only codons for Methionine and Tryptophan, respectively, and the termination codons UAA, UAG, and UGA do not encode any amino acids. Therefore, these five codons were excluded from the analysis. Nucleotide composition was calculated using the program CodonW 1.4.2 .
2.3. Effective Number of Codons (ENC) Analysis
ENC analysis was used to quantify the extent of the codon usage bias of viruses coding sequences, if regardless of the length of a given gene and the number of amino acids. The ENC values range from 20 to 61, in which the larger it is, the weaker the codon preference is. ENC of 20 indicates that there is only one of the synonymous codons for each amino acid and the value of the 61 means that all corresponding amino acids are coded by all synonymous codons equally. Generally, coding sequence has a codon bias significantly when the ENC value is less than or equal to 35 .
2.4. ENC-Plot Analysis
To determine the major factors affecting codon usage bias, an ENC-plot was analyzed with the ENC values plotted against the values. If the points lie on or around the standard curve, the codon usage of given genes is only constrained by mutational pressure. Otherwise, the codon usage pattern is influenced by other factors, such as natural selection. The standard ENC values were calculated using the equation : “s” represents the given (G+C)3S% value
2.5. Neutrality Plot Analysis
The neutrality plot is also named neutral evolution analysis. It is used to compare the influences of mutation pressure and natural selection on the codon usage patterns of the virus coding sequences by plotting the GC12 values of the synonymous codons against the GC3 values . The values of GC12 and GC3 of Flaviviridae virus were calculated by the EMBOSS CUSP program and then subjected to neutrality plot analysis.
2.6. Relative Synonymous Codon Usage (RSCU) Analysis
The RSCU values of the coding sequences were analyzed to gain the characters of synonymous codon usage pattern without the consideration of influence of the composition of amino acids and the size of coding region following a described method .The RSCU values were calculated as follows:xij represents the number of codons for the amino acid and ni represents the degenerate numbers of a specific synonymous codon that ranges from 1 to 61.
2.7. Correspondence Analysis
Correspondence analysis (CoA) is an effective method in identifying the major trends in the codon usage patterns among viruses coding sequences . Each coding region was represented as 59-dimensional vector corresponding to RSCU value of each synonymous codon (excluding AUG, UGG, and stop codons). In this research, the CoA of Flaviviridae viruses were performed by CodonW.
2.8. Correlation Analysis
Correlation analysis was carried out to identify the factors influencing synonymous codon usage patterns by the statistical software SPSS22 . The parameters of viruses were gained from the software EMBOSS CUSP program and CodonW.
2.9. Phylogenetic Analysis
The evolutionary processes of viruses significantly influence their codon usage pattern . To determining the evolutionary relationship between different viruses, phylogenetic analysis based on the nucleotide sequences of coding region of viruses was performed using MEGA7 software.
3.1. Nucleotide Composition of 65 Flaviviridae Viruses
The nucleotide content of 65 Flaviviridae coding sequences was calculated. The results revealed that the A%, U%, G%, C%, and GC % were (mean ± SD), , , , and , respectively. Further, for insight into its potential role on shaping the codon usage pattern, the base contents in the third position of Flaviviridae viruses were also calculated and A3S%, U3S%, G3S%, C3S%, and GC3S % in these viruses were 33.11±0.0405 (mean ± SD), 34.54±0.0253, 27.01±0.0104, 29.14±0.0275, and 44.83±0.0508, respectively. It is clear that U3S% was distinctly high and G3S% was the lowest when compared to other base contents in the third position (Table 2). The result of CAI shows that in relation to E.human, the CAI values of Flaviviridae virus range from 0.673 to 0.740, with an average value of 0.714 and a SD of 0.0163 (Table 1).
3.2. The ENC-GC3s Plots Analysis
The mean value of the ENC values in the viruses was 54.58, the highest was 57.83, and the lowest was 48.75, in which the ENC values of 61 viruses were greater than 50, and that of 4 viruses was less than 50 (Table 2). It indicated that codon usage bias in Flaviviridae viruses is a little low. To investigate the factors affecting Flaviviridae virus codon usage bias, the ENC values were plotted against the GC3S values. In ENC versus GC3S graph, the curve represents the expected values of ENC with the only factor of mutation and the points represent the actual values of ENC of coding sequences in the Flaviviridae viruses (Figure 1). According to the ENC-GC3S plots, all the viruses clustered together below the expected ENC curve, which indicated that in addition to mutation pressure, other factors, such as translational selection, also influence the codon usage pattern of Flaviviridae viruses coding sequences. .
3.3. The RSCU Analysis
As shown in Table 3, most of the high-frequency codons are A/U-ended among the 18 amino acids in the viruses. For example, there are 53 viruses with high-frequency A/U-ended codons of Phenylalanine, accounting for 83.07%, those of Isoleucine accounting for 78.46%, and those of Valine accounting for 86.15%. In another word, Flaviviridae viruses prefer A/U-ended codons (Figure 2).
We performed CoA on the RSCU values, which revealed that the first, second, third, and fourth axis accounted for 50.68%, 9.16%, 3.51%, and 1.63% of the total variation, respectively. Thus, the codon usage bias could be mainly explained by the first axis and second axis values which were plotted to understand the distribution of synonymous codons usage patterns. Each point represents a virus and the closer the points are, the more similar the patterns of the viruses are. As shown in Figure 3, Flaviviridae viruses can be divided into two groups and the others, in which Group A includes Yellow fever virus, Apoi virus, Tembusu virus, Dengue virus 1, Wesselsbron virus and Group B includes West Nile virus lineage 2, Japanese encephalitis virus, Usutu virus, Kedougou virus.
3.4. Neutrality Plot Analysis
In the neutrality plot analysis (Figure 4), a significant positive correlation was observed between the GC12 and GC3 values of Flaviviridae viruses (r2 = 0.06). The slope of the regression line was calculated to be 0.062 which indicated that the mutation pressure and natural selection were calculated to be 6.2% and 93.8%, respectively. It demonstrates the dominant influence of natural selection . In addition, these viruses can be grouped into two clusters, Group A (Yellow fever virus, Apoi virus, Tembusu virus, Dengue virus 1, and others) and Group B (West Nile virus lineage 2, Japanese encephalitis virus, Usutu virus, Kedougou virus, and others) which is similar to the result of RSCU analysis.
3.5. Correlation Analysis
In Table 4, the ENC values had significant correlations with A%, C%, G%, A3S%, C3S%, and GC3S %, respectively in Flaviviridae viruses. Additionally, GC3S % had significant correlations with GC%. These data suggest that the nucleotide constraint influences synonymous codon usage.
ENC values have significant negative correlations with Gravy and Aroma. In addition, U3S %, G3S%, C3S%, and GC3S% have significant negative correlations with Gravy values and A3S% have significant negative correlations with Aroma values. These results indicate that natural selection also influenced codon usage bias along with mutational pressure.
3.6. Phylogenetic Analysis of Flaviviridae Viruses
To evaluate the effects of evolutionary processes on codon usage patterns, phylogenetic analysis was carried out. The results show that 65 Flaviviridae viruses can be divided into two groups (Figure 5), Group I and Group II. Group I includes Kedougou virus, Louping ill virus, West Nile virus lineage 2, and Yaounde virus, and the variation range of their GC3s content is not extensive (0.364 ≤ GC3S ≤0.582). Group II includes Omsk hemorrhagic fever, Alkhurma virus, Tick-borne encephalitis virus, Spanish goat encephalitis virus. And, the variation range of their GC3S content is relatively smaller (0.345 ≤ GC3S ≤ 0.454, respectively). These results suggest that the closer the evolution of species classification, the more similar their codon usage bias
Study of codon usage patterns of viruses can reveal more useful information about overall viral survival, fitness, and evolution . In this research, the majority of Flaviviridae viruses have a weak codon bias with the mean ENC value of 54.58. And this is in accordance with some earlier studies on codon usage bias of Tembusu virus and West Nile virus which has a low codon usage bias [16–18]. According to the calculation results of CodonW (Table 2), the content of A and G is the highest and RSCU analysis indicates that Flaviviridae viruses prefer A/U-ended codons.
Linking to other RNA viruses, such as polioviruses, H5N1 influenza virus, and SARS-covs with the mean ENC values of 53.75, 50.91, and 48.99 [19–21], respectively, we conjecture that the weak codon bias in RNA virus is advantageous to replicate efficiently in host cells . As ENC-GC3S plots analysis shows, mutational pressure and other factors shaped the codon usage patterns of Flaviviridae viruses, which is similar to hepatitis C virus . In fact, Hongju et al. have previously reported that the codon usage bias of ZIKV is weak and the influencing factors of the patterns are not only mutation pressure, but also translational selection, aromaticity, and hydrophobicity . Although in previous studies [14, 23] on Zika virus, it is observed there were greater frequencies of A3S/G3S than U3S. There were some viruses showing contrary characteristics; for example, Aedes flavivirus U3S% was 0.2994 and G3S% was 0.279; Alkhurma virus U3S was % 0.3617 and G3S% was 0.2773. By comprehensive analysis of all results, it can be found that overall U3S% was more and G3S% was lowest. Since Flaviviridae viruses prefer A/U-ended codons and A3S% has a remarkable correlation with ENC (Table 3), we think that compositional constraint shaping the synonymous codon bias was from the content of nucleotides A and U on the third codon position. This result was different from many reports in which compositional constraints influencing codon usage bias are from G and C contents (Zhou et al. 2004) [20, 24]. In addition, it can be found that the correlations of both Gravy values and Aroma values with ENC values are significant, which indicates the role of natural selection in shaping the codon usage patterns of the Flaviviridae viruses . Besides, the codon usage patterns of this family were influenced by nature selection which dominates 93.8% and mutation pressure which dominates 6.2% (Figure 4).
In CoA-RSCU analysis, the Flaviviridae viruses can be divided into two groups and the others. The viruses which have similar codon usage patterns are clustered together. It is similar to the result from Neutrality plot analysis and the phylogenetic tree. All in, it is found that Yellow fever virus, Apoi virus, Tembusu virus, and Dengue virus 1 always clustered together.
In summary, combining the nucleotide composition analysis, ENC-plot analysis, and correlation analysis, it is clear that both mutation pressure and nature selection influence the codon usage patterns of Flaviviridae viruses. In addition, most of the Flaviviridae viruses can also be classified into two categories according to the findings of the CoA-RSCU, neutrality plot analysis, and phylogenetic analysis. Codon usage patterns were similar between different virus species in same group.
In this study, the majority of Flaviviridae viruses have a weak codon usage bias which help to adapt to the diverse host or the varied environment. The Flaviviridae viruses can also be classified into two groups according their codon usage patterns. Their codon usage patterns were influenced by nature selection which dominates 93.8% and mutation pressure which dominates 6.2%. The information from this research may not only help to understand the evolution of Flaviviridae virus, but also have potential value for developing the virus vaccines.
The data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that they have no competing interests.
This work was supported by the research grants from the Department of Education of Sichuan Province, China (13ZB0294), and Sichuan Agricultural University (00770114).
K. Szuhan, L. Yingray, L. Chingyen et al., “Dengue virus-induced ER stress is required for autophagy activation, viral replication, and pathogenesis both in vitro and in vivo,” Scientific Reports, vol. 8, no. 1, 2018.View at: Google Scholar
L. R. Edgar, M. I. Salazar, M. J. Lopez, S. Juan, S. V. Alejandro, and G. Xianwu, “Large-scale genomic analysis of codon usage in dengue virus and evaluation of its phylogenetic dependence,” Biomed Research International, vol. 2014, Article ID 851425, 9 pages, 2014.View at: Publisher Site | Google Scholar
M. B. Azeem, N. Izza, Q. Raheel, and T. Yigang, “Evolution of codon usage in zika virus genomes is host and vector specific,” Emerging Microbes and Infections, vol. 5, no. 10, p. e107, 2016.View at: Google Scholar
J. J. V. Lindern, S. Aroner, N. D. Barrett, J. A. Wicker, C. T. Davis, and A. D. Barrett, “Genome analysis and phylogenetic relationships between east, central and west African isolates of Yellow fever virus,” Journal of General Virology, vol. 87, no. 4, pp. 895–907, 2006.View at: Google Scholar
A. D. Haddow, F. Nasar, H. Guzman et al., “Genetic characterization of spondweni and zika viruses and susceptibility of geographically distinct strains of aedes aegypti, aedes albopictus and culex quinquefasciatus (diptera: culicidae) to spondweni virus,” PLOS Neglected Tropical Diseases, vol. 10, no. 10, Article ID e0005083, 2016.View at: Google Scholar
J. F. Peden, “Analysis of Codon Usage,” University of Nottingham, vol. 90, no. 1, pp. 73-74, 2000.View at: Google Scholar
H. Wang, S. Liu, B. Zhang, and W. Wei, “Analysis of synonymous codon usage bias of zika virus and its adaption to the hosts,” Plos One, vol. 11, no. 11, Article ID e0166260, 2016.View at: Google Scholar
Y. Yuan, S. H. Huang, C. K. Wang, and H. J. Zhi, “Analysis on codon usage and evolution of soybean mosaic virus,” Soybean Science, 2014.View at: Google Scholar
Z. Jie, W. Meng, W. Q. Liu et al., “Analysis of codon usage and nucleotide composition bias in polioviruses,” Virology Journal, vol. 8, no. 1, p. 146, 2011.View at: Google Scholar