BioMed Research International

BioMed Research International / 2019 / Article

Research Article | Open Access

Volume 2019 |Article ID 5857285 | 12 pages |

Analysis of Synonymous Codon Usage Bias in Flaviviridae Virus

Academic Editor: Sankar Subramanian
Received12 Feb 2019
Revised20 May 2019
Accepted03 Jun 2019
Published27 Jun 2019


Background. Flaviviridae viruses are single-stranded, positive-sense RNA viruses, which threat human constantly mediated by mosquitoes, ticks, and sandflies. Considering the recent increase in the prevalence of the family virus and its risk potential, we investigated the codon usage pattern to understand its evolutionary processes and provide some useful data to develop the medications for most of Flaviviridae viruses. Results. The overall extent of codon usage bias in 65 Flaviviridae viruses is low with the average value of GC contents being 50.5% and the highest value being 55.9%; the lowest value is 40.2%. ENC values of Flaviviridae virus genes vary from 48.75 to 57.83 with a mean value of 55.56. U- and A-ended codons are preferred in the Flaviviridae virus. Correlation analysis shows that the positive correlation between ENC value and GC content at the third nucleotide positions was significant in this family virus. The result of analysis of ENC, neutrality plot analysis, and correlation analysis revealed that codon usage bias of all the viruses was affected mainly by natural selection. Meanwhile, according to correspondence analysis (CoA) based on RSCU and phylogenetic analysis, the Flaviviridae viruses mainly are made up of two groups, Group I (Yellow fever virus, Apoi virus, Tembusu virus, Dengue virus 1, and others) and Group II (West Nile virus lineage 2, Japanese encephalitis virus, Usutu virus, Kedougou virus, and others). Conclusions. All in, the bias of codon usage pattern is affected not only by compositional constraints but also by natural selection. Phylogenetic analysis also illustrates that codon usage bias of virus can serve as an effective means of evolutionary classification in Flaviviridae virus.

1. Introduction

All amino acids, except for methionine (Met) and tryptophan (Trp), are coded by more than one synonymous codon in the organism. The phenomenon that alternative synonymous codons do not occur equally is referred to as codon usage bias and this is a process of long-term accumulation. As an important evolutionary phenomenon, it is well known that synonymous codon usage bias exists in a wide range of species from prokaryotes to eukaryotes [1]. Compositional constraints and natural selection are thought to be two main factors influencing codon usage variation among the gene in different organisms [2, 3]. Flaviviridae viruses are single-stranded, positive-sense RNA viruses, which threat human constantly mediated by mosquitoes, ticks, and sandflies, such as Zika virus, Dengue virus, Yellow fever virus, Japanese encephalitis virus, and other viruses. Because their hosts are from the vertebrates and invertebrate, most of Flaviviridae viruses are related to some human diseases. For example, Dengue virus, Japanese encephalitis virus, and Zika virus are mediated by mosquitoes. Dengue virus contains four serotypes (DENV1 to DENV4) and its infection may cause symptoms from mild dengue fever to dengue hemorrhagic fever, even dengue shock syndrome [4] and stabilizing selection acts on the codon usage bias [5]. Spread of the Japanese encephalitis virus, reported from WHO, produced a total of 27, 059 patients during 2006~2009, out of which 86% were from China and India, 20~30% were caused to be fatal and 30~50% of the survivors were found to cause serious postinfection neurological sequelae and Japanese encephalitis virus has low codon usages bias influenced by both mutational pressure and natural selection [6]. Zika virus producing a number of microcephaly in Brazil is rapidly spreading to other parts of the world since 2015. Zika coding sequences have relatively conserved and genotype-specific evolution of codon usage bias [7]. Powassan virus, yellow fever virus, and spondweni virus are mediated by ticks. Powassan virus is a fatal, neurotropic virus, with a 671% rise in cases in the last 18 years, which has become an emerging danger worldwide [8]. Yellow fever virus can cause yellow fever which is endemic in many African and South American countries [9]. Spondweni virus can cause a self-limiting febrile illness characterized by headache, myalgia, nausea, and arthralgia similar to Zika virus infections [10]. Codon usage patterns of some members from the Flaviviridae viruses have been studied, such as Zika virus [7] and Dengue virus [5]. But the population codon usage characteristics of all Flaviviridae viruses have not been reported by now. Considering the recent increase in the prevalence of the family virus and its risk potential, we investigated the codon usage pattern to understand its evolutionary processes and provide some useful data to develop the medications for Flaviviridae viruses.

2. Materials and Methods

2.1. Genetic Material

The complete sequences of 65 Flaviviridae viruses were downloaded from NCBI ( and the detailed information about the viruses is listed in Table 1. The ORFs of the viruses were identified by DNAStar.

Organism/NameAccession idENCGC12GC3GravyAromoCAI

Aedes flavivirusNC_012932.158.970.5020.5010.5241-0.0570.0800.682
Alkhurma hemorrhagic fever virusNC_004355.156.850.4510.5530.4625-0.7970.0620.715
Apoi virusNC_003676.154.490.4110.4970.4193-0.7510.0750.713
Bagaza virusNC_012534.154.460.4290.5080.5207-1.0060.0650.73
Banzi virusNC_029054.152.790.5180.5040.4394-0.1120.0820.74
Bouboui virusNC_033693.154.420.4230.4950.5486-0.8820.0730.722
Bussuquara virusNC_009026.254.190.4910.4990.4367-0.1510.0820.712
Cell fusing agent virusNC_001564.257.100.4260.5220.4365-0.5990.0780.683
Chaoyang virusNC_017086.156.190.4370.4950.4515-0.8940.0760.724
Culex flavivirusNC_008604.254.870.5820.5290.6034-0.0190.0950.702
Dengue virus 1NC_001477.152.150.4240.4760.4314-1.0160.0650.742
Dengue virus 2NC_001474.248.850.4210.4570.4587-0.2260.0770.701
Dengue virus 3NC_001475.249.430.4320.4640.468-0.2190.0780.696
Dengue virus 4NC_002640.150.900.4450.4700.4808-0.1880.0810.703
Donggang virusNC_016997.157.180.4400.5010.4526-0.7560.0970.714
Edge Hill virusNC_030289.153.610.4160.4910.4311-0.8060.0880.728
Entebbe bat virusNC_008718.155.040.4110.5160.4254-0.9050.0600.71
Gadgets Gully virusNC_033723.155.540.4430.5320.4574-0.8720.0680.735
Hanko virusNC_030401.155.430.4560.4650.4866-0.0210.1000.675
Ilheus virusNC_009028.256.120.4470.5350.4595-0.9680.0600.715
Japanese encephalitis virusNC_001437.155.490.5240.5160.5523-0.2090.0810.72
Jugra virusNC_033699.153.930.4210.4930.4343-0.8730.0690.718
Kadam virusNC_033724.156.450.4490.5390.4596-0.8510.0660.729
Kamiti River virusNC_005064.156.760.4070.5110.423-0.6660.0810.681
Karshi virusNC_006947.156.100.4440.5550.4546-0.8040.0640.717
Kedougou virusNC_012533.153.460.5830.5440.6084-0.1250.0810.74
Kokobera virusNC_009029.253.670.4980.4960.5271-0.1610.0830.711
Langat virusNC_003690.155.940.4540.5550.4622-0.8030.0540.723
Louping ill virusNC_001809.153.880.5800.5480.6059-0.1500.0800.736
Meaban virusNC_033721.156.710.4480.5560.4639-0.8940.0690.721
Mercadeo virusNC_027819.156.500.5490.5060.5734-0.0590.1010.699
Modoc virusNC_003635.151.610.3910.4670.4469-0.7010.0920.727
Montana myotis leukoencephalitis virusNC_004119.149.630.3720.4390.4083-0.1270.0900.694
Mosquito flavivirusNC_021069.156.520.4120.5340.4148-0.6670.0660.673
Murray Valley encephalitis virusNC_000943.154.420.4340.4980.4245-0.9250.0860.734
New Mapoon virusNC_032088.156.420.4450.5300.4431-0.9830.0670.714
Nounane virusNC_033715.154.510.4960.4970.5306-0.1380.0880.71
Ntaya virusNC_018705.353.720.4270.4930.4402-0.9040.0700.728
Ochlerotatus caspius flavivirusNC_034242.156.550.4050.4830.4159-0.6120.0920.696
Omsk hemorrhagic fever virusNC_005062.156.130.4430.5450.4534-0.8880.0610.72
Palm Creek virusNC_033694.154.980.4980.4950.52670.0080.0950.682
Paraiso Escondido virusNC_027999.154.680.4230.4870.4398-0.8040.0820.723
Parramatta River virusNC_027817.156.730.4090.4850.4202-0.6460.0930.693
Phnom Penh bat virusNC_034007.152.490.4050.4610.4168-0.6380.1060.72
Powassan virusNC_003687.155.720.4420.5450.4515-0.7990.0610.718
Quang Binh virusNC_012671.156.380.5640.5230.4369-0.0080.0940.698
Rio Bravo virusNC_003675.150.000.3640.4320.4070-0.1170.0950.689
Saboya virusNC_033697.154.450.4260.4920.4370-0.8320.0770.724
Saumarez Reef virusNC_033726.156.150.4500.5480.4641-0.9190.0710.728
Sepik virusNC_008719.154.340.4120.4830.4262-0.8110.0830.717
Spanish goat encephalitis virusNC_027709.155.920.4490.5590.4575-0.8620.0600.724
Spondweni virusNC_029055.156.950.4440.5320.4526-0.9380.0580.715
St. Louis encephalitis virusNC_007580.253.850.4280.5070.4395-0.9880.0670.727
Tamana bat virusNC_003996.148.750.3450.4020.3560-0.6740.1260.704
Tembusu virusNC_015843.254.690.4340.5030.4469-0.8940.0780.727
Tick-borne encephalitis virusNC_001672.155.790.4480.5520.4560-0.7770.0640.722
Uganda S virusNC_033698.152.780.4300.4690.4643-0.1110.0850.701
Usutu virusNC_006551.155.130.5160.5110.5428-0.1650.0830.721
Wesselsbron virusNC_012735.154.890.4150.4870.4303-0.8010.0860.717
West Nile virus 1NC_009942.156.150.4380.5220.5509-0.9730.0610.714
West Nile virus 2NC_001563.254.620.5220.5100.4432-0.1500.0830.729
Yaounde virusNC_034018.152.500.5140.5050.5447-0.1600.0830.721
Yellow fever virusNC_002031.154.540.4130.5090.4257-0.8220.0840.731
Yokose virusNC_005039.154.330.4030.4830.4151-0.8750.0730.718
Zika virusNC_012532.154.210.4390.5200.4495-0.9040.0630.728

2.2. Nucleotide Composition Analysis

The following compositional properties were calculated for the coding sequences of the Flaviviridae virus genomes: (i) overall GC content; (ii) overall frequency of nucleotides (A%, C%, U%, and G%); (iii) frequency of each nucleotide at the third site of the synonymous codons (%, %, %, and %); (iv) frequency of nucleotides G + C at the third synonymous codon positions (%); (v) frequency of nucleotides G + C at the third codon positions (GC3) and the mean of the frequency of both G + C at the first and second position (GC12). The codons AUG and UGG are the only codons for Methionine and Tryptophan, respectively, and the termination codons UAA, UAG, and UGA do not encode any amino acids. Therefore, these five codons were excluded from the analysis. Nucleotide composition was calculated using the program CodonW 1.4.2 [11].

2.3. Effective Number of Codons (ENC) Analysis

ENC analysis was used to quantify the extent of the codon usage bias of viruses coding sequences, if regardless of the length of a given gene and the number of amino acids. The ENC values range from 20 to 61, in which the larger it is, the weaker the codon preference is. ENC of 20 indicates that there is only one of the synonymous codons for each amino acid and the value of the 61 means that all corresponding amino acids are coded by all synonymous codons equally. Generally, coding sequence has a codon bias significantly when the ENC value is less than or equal to 35 [7].

2.4. ENC-Plot Analysis

To determine the major factors affecting codon usage bias, an ENC-plot was analyzed with the ENC values plotted against the values. If the points lie on or around the standard curve, the codon usage of given genes is only constrained by mutational pressure. Otherwise, the codon usage pattern is influenced by other factors, such as natural selection. The standard ENC values were calculated using the equation [12]: “s” represents the given (G+C)3S% value

2.5. Neutrality Plot Analysis

The neutrality plot is also named neutral evolution analysis. It is used to compare the influences of mutation pressure and natural selection on the codon usage patterns of the virus coding sequences by plotting the GC12 values of the synonymous codons against the GC3 values [7]. The values of GC12 and GC3 of Flaviviridae virus were calculated by the EMBOSS CUSP program and then subjected to neutrality plot analysis.

2.6. Relative Synonymous Codon Usage (RSCU) Analysis

The RSCU values of the coding sequences were analyzed to gain the characters of synonymous codon usage pattern without the consideration of influence of the composition of amino acids and the size of coding region following a described method [7].The RSCU values were calculated as follows:xij represents the number of codons for the amino acid and ni represents the degenerate numbers of a specific synonymous codon that ranges from 1 to 61.

2.7. Correspondence Analysis

Correspondence analysis (CoA) is an effective method in identifying the major trends in the codon usage patterns among viruses coding sequences [5]. Each coding region was represented as 59-dimensional vector corresponding to RSCU value of each synonymous codon (excluding AUG, UGG, and stop codons). In this research, the CoA of Flaviviridae viruses were performed by CodonW.

2.8. Correlation Analysis

Correlation analysis was carried out to identify the factors influencing synonymous codon usage patterns by the statistical software SPSS22 [7]. The parameters of viruses were gained from the software EMBOSS CUSP program and CodonW.

2.9. Phylogenetic Analysis

The evolutionary processes of viruses significantly influence their codon usage pattern [13]. To determining the evolutionary relationship between different viruses, phylogenetic analysis based on the nucleotide sequences of coding region of viruses was performed using MEGA7 software.

3. Results

3.1. Nucleotide Composition of 65 Flaviviridae Viruses

The nucleotide content of 65 Flaviviridae coding sequences was calculated. The results revealed that the A%, U%, G%, C%, and GC % were (mean ± SD), , , , and , respectively. Further, for insight into its potential role on shaping the codon usage pattern, the base contents in the third position of Flaviviridae viruses were also calculated and A3S%, U3S%, G3S%, C3S%, and GC3S % in these viruses were 33.11±0.0405 (mean ± SD), 34.54±0.0253, 27.01±0.0104, 29.14±0.0275, and 44.83±0.0508, respectively. It is clear that U3S% was distinctly high and G3S% was the lowest when compared to other base contents in the third position (Table 2). The result of CAI shows that in relation to E.human, the CAI values of Flaviviridae virus range from 0.673 to 0.740, with an average value of 0.714 and a SD of 0.0163 (Table 1).

virus strainAUGCGC

Aedes flavivirus0.25280.24670.26210.23840.29910.29940.2790.34120.501
Alkhurma virus0.24360.21570.3710.22370.29760.36170.27730.27290.553
Apoi virus0.27630.27540.27540.20750.33670.39060.24410.27220.497
Bagaza virus0.29160.21250.29290.21290.33070.37270.25010.28930.508
Banzi virus0.27270.22320.28160.22250.30890.27310.30450.3380.504
Bouboui virus0.28890.22970.26980.21160.33510.37850.24520.28670.495
Bussuquara virus0.28420.21740.283O0.21550.36010.25760.30670.30660.499
Cell fusing agent virus0.24430.24590.27440.23550.31150.37960.24070.28430.522
Chaoyang virus0.29010.2250.27040.21260.330.37040.24910.30710.495
Culex flavivirus0.23490.23570.29730.23210.25010.25450.3970.33280.529
Dengue virus 10.31910.21420.260.20870.35470.36520.2340.30450.476
Dengue virus 20.33170.21110.25290.20430.47140.23340.24510.28710.457
Dengue virus 30.32190.21410.25860.20530.44620.24570.26930.27630.464
Dengue virus 40.31050.22010.26310.20630.42490.24730.27570.28250.47
Donggang virus0.27070.2450.27190.21240.31520.37690.25110.30510.501
Edge Hill virus0.28470.23890.28420.19210.32880.39930.24820.28220.491
Entebbe bat virus0.27830.21590.28060.22520.34730.3830.23520.2790.516
Gadgets Gully virus0.26240.21660.3130.20790.29860.37710.27430.27570.532
Hanko virus0.26910.26570.26320.20190.33050.33140.30910.27260.465
Ilheus virus0.26890.20820.27970.24320.32310.35870.24620.30970.535
Japanese encephalitis ORF0.27760.20820.28360.23060.33420.24010.29890.34890.516
Jugra virus0.29010.22820.26980.21190.33280.38430.23820.29150.493
Kadam virus0.25470.220.31070.21560.30570.36680.2860.27330.539
Kamiti River virus0.25140.24870.26590.2340.32140.39840.23250.27170.511
Karshi virus,0.24310.21350.32110.22220.30260.37020.27230.27390.555
Kedougou virus0.24310.21350.32110.22220.25870.24010.39280.32610.544
Kokobera virus0.28130.22520.28920.210.34920.25860.31730.30490.496
Langat virus0.24470.2120.32310.22020.28970.37060.28120.27730.555
Louping ill virus0.24470.20720.32130.22670.26950.23190.37660.33140.548
Meaban virus0.24440.21360.33080.21120.30050.37150.29270.26320.556
Mercadeo virus0.25060.24370.27580.22990.27080.27930.34490.34980.506
Modoc virus0.29620.25160.27130.18090.35020.41550.22850.27410.467
Montana myotis leukoencephalitis virus0.29810.26340.26020.17830.38490.38090.25580.21810.439
Mosquito flavivirus0.23840.23950.27810.24350.31860.3850.23110.27080.534
Murray Valley encephalitis virus0.28880.230.27480.21140.33190.37630.24760.30410.498
New Mapoon virus0.26350.22130.27060.24460.31340.3650.25490.29540.53
Nounane virus0.28560.2170.27250.22450.3670.24740.28750.33150.497
Ntaya virus0.28560.2170.27250.22450.33260.37880.25030.29140.493
Ochlerotatus caspius flavivirus0.26880.25880.26650.20590.33540.40170.23260.27970.483
Omsk hemorrhagic fever virus0.25660.20820.31270.22240.30580.36680.27230.27060.545
Palm Creek virus i0.26360.24110.28780.20750.3440.2630.35880.26860.495
Paraiso Escondido virus0.28960.23550.28990.18510.32860.39040.25790.28470.487
Parramatta River virus0.26710.25950.26420.20930.33590.39380.23290.28160.485
Phnom Penh bat virus0.28710.26630.27160.17480.33840.41360.25930.26640.461
Powassan virus0.25160.21570.31280.21980.30520.3690.27530.26850.545
Quang Binh virus0.23880.23850.27760.24510.26150.26450.33210.36810.523
Rio Bravo virus0.30250.26560.25020.18170.40230.37980.22830.23630.432
Saboya virus0.2850.23850.27060.20590.32850.38720.24490.29440.492
Saumarez Reef virus0.25110.2150.31730.21660.29730.3740.29590.2670.548
Sepik virus0.28990.23860.26380.20770.34070.39310.2370.28770.483
Spanish goat encephalitis virus0.24420.20810.32330.22440.29820.36670.27760.2730.559
Spondweni virus0.24420.20810.32330.22440.31830.36310.24830.30240.532
St. Louis encephalitis virus0.29180.21160.28010.21650.33990.37210.24710.2930.507
Tamana bat virus0.33120.28390.21560.16920.42080.42850.19010.26180.402
Tembusu virus0.28660.22380.28940.20020.33040.37170.25670.29090.503
Tick-borne encephalitis virus0.2470.2120.32070.22020.30130.36680.27430.27650.552
Uganda S virus0.2470.210.32070.22020.38020.31140.25940.27950.469
Usutu virus0.27070.21920.28190.22810.31170.26990.30120.33680.511
Wesselsbron virus0.28560.23950.26650.20850.34060.39060.23450.29260.487
West Nile virus lineage 10.27240.21550.28830.22380.32610.3690.25520.29230.522
West Nile virus lineage 20.27270.217O0.28270.22750.31410.26350.3070.34170.51
Yaounde virus0.28840.20680.28490.21990.37120.21880.31920.32270.505
Yellow fever virus0.27040.23260.2860.21090.33360.39690.23860.28390.509
Yokose virus0.30030.22980.25970.21030.3540.38870.22340.28390.483
Zika virus0.27770.21460.29080.21690.31920.37040.25320.29520.52
average value0.27320.22880.28490.214790.33100.34400.27050.29240.5052

3.2. The ENC-GC3s Plots Analysis

The mean value of the ENC values in the viruses was 54.58, the highest was 57.83, and the lowest was 48.75, in which the ENC values of 61 viruses were greater than 50, and that of 4 viruses was less than 50 (Table 2). It indicated that codon usage bias in Flaviviridae viruses is a little low. To investigate the factors affecting Flaviviridae virus codon usage bias, the ENC values were plotted against the GC3S values. In ENC versus GC3S graph, the curve represents the expected values of ENC with the only factor of mutation and the points represent the actual values of ENC of coding sequences in the Flaviviridae viruses (Figure 1). According to the ENC-GC3S plots, all the viruses clustered together below the expected ENC curve, which indicated that in addition to mutation pressure, other factors, such as translational selection, also influence the codon usage pattern of Flaviviridae viruses coding sequences. [14].

3.3. The RSCU Analysis

As shown in Table 3, most of the high-frequency codons are A/U-ended among the 18 amino acids in the viruses. For example, there are 53 viruses with high-frequency A/U-ended codons of Phenylalanine, accounting for 83.07%, those of Isoleucine accounting for 78.46%, and those of Valine accounting for 86.15%. In another word, Flaviviridae viruses prefer A/U-ended codons (Figure 2).


Alkhurma virus1.261.631.251.341.561.341.440.771.260.691.
Apoi virus1.111.491.701.481.351.431.671.161.280.511.151.391.
Bagaza virus1.131.891.411.371.521.491.260.811.210.561.
Banzi virus1.041.921.511.561.281.641.641.461.
Bouboui virus1.311.501.291.471.581.291.461.121.330.731.
Bussuquara virus1.121.691.171.701.291.461.491.541.
Cell fusing agent virus1.231.741.291.501.261.241.431.411.160.691.
Chaoyang virus1.241.581.271.361.431.191.480.901.190.601.
Culex flavivirus1.191.711.491.971.411.331.251.311.513.001.341.171.351.391.301.
Dengue virus 11.251.651.191.661.951.351.571.001.460.461.
Dengue virus 21.101.321.161.681.632.052.362.
Dengue virus 31.241.341.141.731.351.831.852.
Dengue virus 41.321.411.251.461.231.091.381.491.401.771.271.461.
Donggang virus1.301.731.361.581.751.241.481.801.031.871.241.341.
Edge Hill virus1.091.811.461.661.541.381.532.
Entebbe bat virus1.091.491.501.451.751.351.412.001.150.541.
Gadgets Gully virus1.
Hanko virus1.251.701.161.381.271.471.411.
Ilheus virus1.041.681.261.631.
Jugra virus1.141.681.591.481.331.311.541.301.
Kadam virus1.241.781.441.331.371.231.291.371.362.
Kamiti River virus1.171.731.681.451.531.441.261.071.572.381.
Karshi virus,1.021.991.362.281.261.461.301.
Kedougou virus1.011.741.041.921.331.271.611.511.
Kokobera virus1.131.611.221.641.551.301.671.751.
Langat virus1.122.271.362.
Louping ill virus1.131.771.341.531.421.261.571.811.
Meaban virus1.121.461.501.391.
Mercadeo virus1.171.471.471.811.851.371.551.831.011.841.241.321.
Modoc virus1.361.771.371.551.691.521.941.831.093.001.321.
Montana myotis leukoencephalitis virus1.051.911.331.671.541.301.
Mosquito flavivirus1.291.831.451.401.651.371.352.
Murray Valley encephalitis virus1.021.541.331.191.661.261.871.881.652.
New Mapoon virus1.211.751.191.701.171.371.572.321.
Nounane virus1.471.641.391.601.571.181.322.
Ntaya virus1.261.711.301.561.331.301.441.531.
Ochlerotatus caspius flavivirus1.201.611.231.481.801.401.631.741.
Omsk hemorrhagic fever virus1.
Palm Creek virus1.261.661.471.761.221.341.421.921.491.981.361.
Paraiso Escondido virus1.321.851.291.761.081.121.351.481.331.991.211.571.121.411.151.531.141.401.26
Parramatta River virus1.181.631.641.571.811.551.211.781.141.841.011.301.161.031.311.261.282.311.29
Phnom Penh bat virus1.221.811.241.371.721.311.261.791.
Powassan virus1.191.581.721.421.411.351.091.171.413.001.361.061.321.
Quang Binh virus1.091.601.351.471.811.451.892.
Rio Bravo virus1.221.701.331.301.451.301.561.581.301.871.261.341.
Saboya virus strain1.161.771.191.511.491.451.461.571.
Saumarez Reef virus1.181.731.301.621.581.251.391.671.031.961.311.441.
Sepik virus1.381.701.081.421.711.471.381.591.
Spanish goat encephalitis virus1.121.531.201.391.681.281.451.821.232.381.
Spondweni virus1.251.931.571.421.511.421.492.
St. Louis encephalitis virus1.301.421.801.591.451.781.631.961.011.551.291.621.271.331.261.421.433.071.35
Tamana bat virus1.361.741.521.331.551.431.291.871.131.791.331.101.011.321.
Tembusu virus1.191.741.551.381.661.381.381.681.112.341.
Tick-borne encephalitis virus1.111.511.161.471.361.
Uganda S virus1.031.741.211.431.251.401.591.381.171.501.
Usutu virus1.111.601.451.551.591.321.321.681.
Wesselsbron virus1.231.551.571.471.701.321.461.711.022.351.
West Nile virus lineage 11.081.811.421.671.371.261.621.521.
West Nile virus lineage 21.131.771.502.032.031.491.491.701.
Yaounde virus1.131.561.271.441.641.291.621.821.
Yellow fever virus1.311.571.441.751.401.251.501.791.
Yokose virus1.021.621.241.561.611.461.481.841.
Zika virus1.081.421.221.291.621.071.321.

Ration of A/U-ended codons(%)84.6261.4576.9263.0784.6281.5498.4676.9269.2384.6287.6973.8546.1578.4664.6183.8278.4683.0793.85

We performed CoA on the RSCU values, which revealed that the first, second, third, and fourth axis accounted for 50.68%, 9.16%, 3.51%, and 1.63% of the total variation, respectively. Thus, the codon usage bias could be mainly explained by the first axis and second axis values which were plotted to understand the distribution of synonymous codons usage patterns. Each point represents a virus and the closer the points are, the more similar the patterns of the viruses are. As shown in Figure 3, Flaviviridae viruses can be divided into two groups and the others, in which Group A includes Yellow fever virus, Apoi virus, Tembusu virus, Dengue virus 1, Wesselsbron virus and Group B includes West Nile virus lineage 2, Japanese encephalitis virus, Usutu virus, Kedougou virus.

3.4. Neutrality Plot Analysis

In the neutrality plot analysis (Figure 4), a significant positive correlation was observed between the GC12 and GC3 values of Flaviviridae viruses (r2 = 0.06). The slope of the regression line was calculated to be 0.062 which indicated that the mutation pressure and natural selection were calculated to be 6.2% and 93.8%, respectively. It demonstrates the dominant influence of natural selection [15]. In addition, these viruses can be grouped into two clusters, Group A (Yellow fever virus, Apoi virus, Tembusu virus, Dengue virus 1, and others) and Group B (West Nile virus lineage 2, Japanese encephalitis virus, Usutu virus, Kedougou virus, and others) which is similar to the result of RSCU analysis.

3.5. Correlation Analysis

In Table 4, the ENC values had significant correlations with A%, C%, G%, A3S%, C3S%, and GC3S %, respectively in Flaviviridae viruses. Additionally, GC3S % had significant correlations with GC%. These data suggest that the nucleotide constraint influences synonymous codon usage.

Variables AUGCGCGravyAromoENC


Note: Means p < 0.01.
Means 0.01 < p < 0.05.
N Means no correlation.

ENC values have significant negative correlations with Gravy and Aroma. In addition, U3S %, G3S%, C3S%, and GC3S% have significant negative correlations with Gravy values and A3S% have significant negative correlations with Aroma values. These results indicate that natural selection also influenced codon usage bias along with mutational pressure.

3.6. Phylogenetic Analysis of Flaviviridae Viruses

To evaluate the effects of evolutionary processes on codon usage patterns, phylogenetic analysis was carried out. The results show that 65 Flaviviridae viruses can be divided into two groups (Figure 5), Group I and Group II. Group I includes Kedougou virus, Louping ill virus, West Nile virus lineage 2, and Yaounde virus, and the variation range of their GC3s content is not extensive (0.364 ≤ GC3S ≤0.582). Group II includes Omsk hemorrhagic fever, Alkhurma virus, Tick-borne encephalitis virus, Spanish goat encephalitis virus. And, the variation range of their GC3S content is relatively smaller (0.345 ≤ GC3S ≤ 0.454, respectively). These results suggest that the closer the evolution of species classification, the more similar their codon usage bias

4. Discussion

Study of codon usage patterns of viruses can reveal more useful information about overall viral survival, fitness, and evolution [6]. In this research, the majority of Flaviviridae viruses have a weak codon bias with the mean ENC value of 54.58. And this is in accordance with some earlier studies on codon usage bias of Tembusu virus and West Nile virus which has a low codon usage bias [1618]. According to the calculation results of CodonW (Table 2), the content of A and G is the highest and RSCU analysis indicates that Flaviviridae viruses prefer A/U-ended codons.

Linking to other RNA viruses, such as polioviruses, H5N1 influenza virus, and SARS-covs with the mean ENC values of 53.75, 50.91, and 48.99 [1921], respectively, we conjecture that the weak codon bias in RNA virus is advantageous to replicate efficiently in host cells [22]. As ENC-GC3S plots analysis shows, mutational pressure and other factors shaped the codon usage patterns of Flaviviridae viruses, which is similar to hepatitis C virus [22]. In fact, Hongju et al. have previously reported that the codon usage bias of ZIKV is weak and the influencing factors of the patterns are not only mutation pressure, but also translational selection, aromaticity, and hydrophobicity [14]. Although in previous studies [14, 23] on Zika virus, it is observed there were greater frequencies of A3S/G3S than U3S. There were some viruses showing contrary characteristics; for example, Aedes flavivirus U3S% was 0.2994 and G3S% was 0.279; Alkhurma virus U3S was % 0.3617 and G3S% was 0.2773. By comprehensive analysis of all results, it can be found that overall U3S% was more and G3S% was lowest. Since Flaviviridae viruses prefer A/U-ended codons and A3S% has a remarkable correlation with ENC (Table 3), we think that compositional constraint shaping the synonymous codon bias was from the content of nucleotides A and U on the third codon position. This result was different from many reports in which compositional constraints influencing codon usage bias are from G and C contents (Zhou et al. 2004) [20, 24]. In addition, it can be found that the correlations of both Gravy values and Aroma values with ENC values are significant, which indicates the role of natural selection in shaping the codon usage patterns of the Flaviviridae viruses [6]. Besides, the codon usage patterns of this family were influenced by nature selection which dominates 93.8% and mutation pressure which dominates 6.2% (Figure 4).

In CoA-RSCU analysis, the Flaviviridae viruses can be divided into two groups and the others. The viruses which have similar codon usage patterns are clustered together. It is similar to the result from Neutrality plot analysis and the phylogenetic tree. All in, it is found that Yellow fever virus, Apoi virus, Tembusu virus, and Dengue virus 1 always clustered together.

In summary, combining the nucleotide composition analysis, ENC-plot analysis, and correlation analysis, it is clear that both mutation pressure and nature selection influence the codon usage patterns of Flaviviridae viruses. In addition, most of the Flaviviridae viruses can also be classified into two categories according to the findings of the CoA-RSCU, neutrality plot analysis, and phylogenetic analysis. Codon usage patterns were similar between different virus species in same group.

5. Conclusion

In this study, the majority of Flaviviridae viruses have a weak codon usage bias which help to adapt to the diverse host or the varied environment. The Flaviviridae viruses can also be classified into two groups according their codon usage patterns. Their codon usage patterns were influenced by nature selection which dominates 93.8% and mutation pressure which dominates 6.2%. The information from this research may not only help to understand the evolution of Flaviviridae virus, but also have potential value for developing the virus vaccines.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no competing interests.


This work was supported by the research grants from the Department of Education of Sichuan Province, China (13ZB0294), and Sichuan Agricultural University (00770114).


  1. M. Archetti, “Codon usage bias and mutation constraints reduce the level of error minimization of the genetic code,” Journal of Molecular Evolution, vol. 59, no. 2, pp. 258–266, 2004. View at: Publisher Site | Google Scholar
  2. P. M. Sharp, T. M. F. Tuohy, and K. R. Mosurski, “Codon usage in yeast: Cluster analysis clearly differentiates highly and lowly expressed genes,” Nucleic Acids Research, vol. 14, no. 13, pp. 5125–5143, 1986. View at: Publisher Site | Google Scholar
  3. T. Lesnik, J. Solomovici, A. Deana, R. Ehrlich, and C. Reiss, “Ribosome traffic in E. coli and regulation of gene expression,” Journal of Theoretical Biology, vol. 202, no. 2, pp. 175–185, 2000. View at: Publisher Site | Google Scholar
  4. K. Szuhan, L. Yingray, L. Chingyen et al., “Dengue virus-induced ER stress is required for autophagy activation, viral replication, and pathogenesis both in vitro and in vivo,” Scientific Reports, vol. 8, no. 1, 2018. View at: Google Scholar
  5. L. R. Edgar, M. I. Salazar, M. J. Lopez, S. Juan, S. V. Alejandro, and G. Xianwu, “Large-scale genomic analysis of codon usage in dengue virus and evaluation of its phylogenetic dependence,” Biomed Research International, vol. 2014, Article ID 851425, 9 pages, 2014. View at: Publisher Site | Google Scholar
  6. N. K. Singh, A. Tyagi, R. Kaur, R. Verma, and P. K. Gupta, “Characterization of codon usage pattern and influencing factors in Japanese encephalitis virus,” Virus Research, vol. 221, pp. 58–65, 2016. View at: Publisher Site | Google Scholar
  7. M. B. Azeem, N. Izza, Q. Raheel, and T. Yigang, “Evolution of codon usage in zika virus genomes is host and vector specific,” Emerging Microbes and Infections, vol. 5, no. 10, p. e107, 2016. View at: Google Scholar
  8. S. S. Fatmi, R. Zehra, and D. O. Carpenter, “Powassan virus-a new reemerging tick-borne disease,” Frontiers in Public Health, vol. 5, 2017. View at: Publisher Site | Google Scholar
  9. J. J. V. Lindern, S. Aroner, N. D. Barrett, J. A. Wicker, C. T. Davis, and A. D. Barrett, “Genome analysis and phylogenetic relationships between east, central and west African isolates of Yellow fever virus,” Journal of General Virology, vol. 87, no. 4, pp. 895–907, 2006. View at: Google Scholar
  10. A. D. Haddow, F. Nasar, H. Guzman et al., “Genetic characterization of spondweni and zika viruses and susceptibility of geographically distinct strains of aedes aegypti, aedes albopictus and culex quinquefasciatus (diptera: culicidae) to spondweni virus,” PLOS Neglected Tropical Diseases, vol. 10, no. 10, Article ID e0005083, 2016. View at: Google Scholar
  11. J. F. Peden, “Analysis of Codon Usage,” University of Nottingham, vol. 90, no. 1, pp. 73-74, 2000. View at: Google Scholar
  12. N. Kumar, B. C. Bera, B. D. Greenbaum et al., “Revelation of influencing factors in overall codon usage bias of equine influenza viruses,” PLoS ONE, vol. 11, no. 4, Article ID e0154376, 2016. View at: Publisher Site | Google Scholar
  13. A. Insung and S. Hyeonseok, “Evolutionary analysis of human-origin influenza A virus (H3N2) genes associated with the codon usage patterns since 1993,” Virus Genes, vol. 44, no. 2, pp. 198–206, 2012. View at: Publisher Site | Google Scholar
  14. H. Wang, S. Liu, B. Zhang, and W. Wei, “Analysis of synonymous codon usage bias of zika virus and its adaption to the hosts,” Plos One, vol. 11, no. 11, Article ID e0166260, 2016. View at: Google Scholar
  15. Y. Yuan, S. H. Huang, C. K. Wang, and H. J. Zhi, “Analysis on codon usage and evolution of soybean mosaic virus,” Soybean Science, 2014. View at: Google Scholar
  16. H. Zhou, B. Yan, S. Chen, M. Wang, R. Jia, and A. Cheng, “Evolutionary characterization of Tembusu virus infection through identification of codon usage patterns,” Infection, Genetics and Evolution, vol. 35, pp. 27–33, 2015. View at: Publisher Site | Google Scholar
  17. Y.-P. Ma, Z.-W. Zhou, Z.-X. Liu et al., “Codon usage bias of the phosphoprotein gene of spring viraemia of carp virus and high codon adaptation to the host,” Archives of Virology, vol. 159, no. 7, pp. 1841–1847, 2014. View at: Publisher Site | Google Scholar
  18. X. X. Ma, Y. P. Feng, J. L. Liu et al., “Characteristics of synonymous codon usage bias in the beginning region of West Nile virus,” Genetics and Molecular Research, vol. 13, no. 3, pp. 7347–7355, 2014. View at: Publisher Site | Google Scholar
  19. Z. Jie, W. Meng, W. Q. Liu et al., “Analysis of codon usage and nucleotide composition bias in polioviruses,” Virology Journal, vol. 8, no. 1, p. 146, 2011. View at: Google Scholar
  20. T. Zhou, W. Gu, J. Ma, X. Sun, and Z. Lu, “Analysis of synonymous codon usage in H5N1 virus and other influenza A viruses,” BioSystems, vol. 81, no. 1, pp. 77–86, 2005. View at: Publisher Site | Google Scholar
  21. W. Gu, T. Zhou, J. Ma, X. Sun, and Z. Lu, “Analysis of synonymous codon usage in SARS coronavirus and other viruses in the nidovirales,” Virus Research, vol. 101, no. 2, pp. 155–161, 2004. View at: Publisher Site | Google Scholar
  22. J.-S. Hu, Q.-Q. Wang, J. Zhang et al., “The characteristic of codon usage pattern and its evolution of hepatitis C virus,” Infection, Genetics and Evolution, vol. 11, no. 8, pp. 2098–2102, 2011. View at: Publisher Site | Google Scholar
  23. N. A. Rahman and I. Huhtaniemi, “Zika virus infection—do they also endanger male fertility?” Science China Life Sciences, vol. 60, no. 3, pp. 324-325, 2017. View at: Publisher Site | Google Scholar
  24. S. Zhao, Q. Zhang, X. Liu et al., “Analysis of synonymous codon usage in 11 Human Bocavirus isolates,” BioSystems, vol. 92, no. 3, pp. 207–214, 2008. View at: Publisher Site | Google Scholar

Copyright © 2019 Huipeng Yao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

More related articles

739 Views | 347 Downloads | 2 Citations
 PDF  Download Citation  Citation
 Download other formatsMore
 Order printed copiesOrder

Related articles

We are committed to sharing findings related to COVID-19 as quickly and safely as possible. Any author submitting a COVID-19 paper should notify us at to ensure their research is fast-tracked and made available on a preprint server as soon as possible. We will be providing unlimited waivers of publication charges for accepted articles related to COVID-19. Sign up here as a reviewer to help fast-track new submissions.