Abstract

The Baculoviridae is a large group of insect viruses containing circular double-stranded DNA genomes of 80 to 180 kbp. In this study, genome sequences from 57 baculoviruses were analyzed to reevaluate the number and identity of core genes and to understand the distribution of the remaining coding sequences. Thirty one core genes with orthologs in all genomes were identified along with other 895 genes differing in their degrees of representation among reported genomes. Many of these latter genes are common to well-defined lineages, whereas others are unique to one or a few of the viruses. Phylogenetic analyses based on core gene sequences and the gene composition of the genomes supported the current division of the Baculoviridae into 4 genera: Alphabaculovirus, Betabaculovirus, Gammabaculovirus, and Deltabaculovirus.

1. Background

Baculoviruses are arthropod-specific viruses containing large double-stranded circular DNA genomes of 80,000–180,000 bp. The progeny generation is biphasic, with two different phenotypes during virus infection: budded viruses (BVs), during the initial stage of the multiplication cycle, and occlusion-derived viruses (ODVs), at the final stages of replication [1, 2]. In general, primary infection takes place in the insect midgut cells after ingestion of occlusion bodies (OBs). Following this stage, systemic infection is caused by the initial BV progeny [3, 4]. And finally, OBs are produced during the last stage of the infection. These OBs comprise virions embedded in a protein matrix which protects them from the environment [5, 6].

Baculoviruses have been used extensively in many biological applications such as protein expression systems, models of genetic regulatory networks and genome evolution, putative nonhuman viral vectors for gene delivery, and biological control agents against insect pests [717].

The Baculoviridae family is divided into four genera according to common biological and structural characteristics: Alphabaculovirus, which includes lepidopteran-specific baculoviruses and is subdivided into Group I or Group II based on the type of fusogenic protein, Betabaculovirus, comprising lepidopteran-specific granuloviruses, Gammabaculovirus, which includes hymenopteran-specific baculoviruses, and finally Deltabaculovirus which, to date, comprises only CuniNPV and possibly the still undescribed dipteran-specific baculoviruses [1, 1820].

The comparison between known genome sequences of all baculoviruses has been the source for identifying a common set of genes, the baculovirus core genes. However, there are probably more orthologous sequences that may not be identified due to the accumulation of many mutations throughout evolution. Thus, core genes seem to be a key factor for some of the main biological functions, such as those necessary to transcribe viral late genes, produce virion structure, infect gut cells abrogate host metabolism and establish infections [2124].

For this report, previous data as well as bioinformatic studies conducted on currently available sets of completely sequenced baculovirus genomes were taken into account and have resulted in a summary of gene content and phylogenetic analyses which validates the classification of this important viral family.

2. Baculovirus Ancestral Genes

There are currently 57 complete baculovirus genomes deposited in GenBank (Table 1). These include 41 Alphabaculoviruses, 12 Betabaculoviruses, 3 Gammabaculoviruses, and 1 Deltabaculovirus.

As a first approach to perform a comparative analysis, the GC content of the genomes were calculated (Figure 1). The histogram revealed that many baculoviruses have about 41% of GC content although several of them have significantly higher values (CfMNPV at 50.1%, CuniNPV at 50.9%, AnpeNPV-L2 at 53.5%, AnpeNPV-Z at 53.5%, LyxyNPV at 53.5%, OpMNPV at 55.1%, and LdMNPV at 57.5%). A detailed analysis of DNA content did not show a clear pattern of GC content that could be associated with each genus.

Further characterization of the patterns of gene content and organization may prove useful for establishing evolutionary relationships among members of Baculoviridae. The high variability observed in the number of coding sequences becomes a key feature of viruses with large DNA genomes that infect eukaryotic cells [18]. Insertions, deletions, duplication events, and/or sequence reorganizations by recombination or transposition processes seem to be the main forces of the macroevolution in this particular kind of biological entities. For example, the loss or gain of genetic material could provide new important abilities for colonization of new hosts, or they could improve performance within established hosts. However, there seems to be a set of core genes whose absence would imply the loss of basic biological functions, and that could be typical of the viral family. In view of this, and considering previous reports [1, 19, 22, 23], the amount and identity of baculovirus common genes were reevaluated (Table 2). As a result, P6.9 and Desmoplakin were recognized in this work, as core proteins by using sequence analysis complementary to the standard ones (see Supplementary files available at http://dx.doi.org/10.4061/2011/379424).

The group of conserved sequences found in all baculovirus genomes is consistently estimated at about 30 shared genes, regardless of the increasing number of genomes analyzed [22, 148]. Meanwhile, the role or function assigned to several sequences has been renewed, according to new studies. In particular, it has been identified that 38k (Ac98) gene encodes a protein which is part of the capsid structure [121, 122]; P33 (Ac92) is a sulfhydryl oxidase which could be related to the proper production of virions in the infected cell nucleus [123125]; ODV-EC43 (Ac109) is a structural component which would be involved in BV and ODV generation [126]; P49 (Ac142) is a capsid protein important in DNA processing, packaging, and capsid morphogenesis [129]; Ac81 interacts with Actin 3 in the cytoplasm but does not appear in BVs or in ODVs [135]; ODV-E18 (Ac143) would mediate BV production [131]; desmoplakin (Ac66) seems to be essential in releasing processes from virogenic stroma to cytoplasm [132]; PIF-4 (Ac96) and PIF-5 (ODV-56, Ac148) are ODV envelope proteins with an essential role in peros infection route [145, 147]; Ac68 may be involved in polyhedron morphogenesis [130].

The number and identity of shared orthologous genes in every accepted member of each genus were investigated, and the unique sequences typical of each clade as well as those shared between different phylogenetic groups were identified (Figure 2).

This analysis shows that the four accepted baculovirus genera have accumulated a large number of genes during evolution. Probably, many of these sequences have been incorporated into viral genomes prior to diversification processes since they are found in members of different genera. In contrast, other genes are unique to each genus, suggesting that they have been incorporated more recently and after diversification (Table 3). The possibility that nonshared genes found only in one genus which represent baculovirus ancestral sequences deleted in the other lineages should also be considered. In any case, a set of particular genes which could help in an appropriate genus taxonomy of new baculoviruses with partial sequence information were obtained from this analysis.

3. Whole Baculovirus Gene Content

The study of all genes reported in the 57 completely sequenced viral genomes revealed the existence of about 895 different ORFs, a set of sequences that might be called the whole baculovirus gene content. This high number of potential coding sequences contrasts with the range of gene content among the family members, which is between 90–181 genes (Alphabaculovirus: 118–169; Betabaculovirus: 116–181; Gammabaculovirus: 90–93; Deltabaculovirus: 109) as well as with the proportion of core genes which represents only 3%. This curious biological feature supports the hypothesis that highlights the great importance of structural mutations in the macroevolution of viruses with large DNA genomes. From this view, the set of genes shared by all members belonging to each baculovirus genus was compared to those corresponding to the whole genus gene content (Figure 3).

The analysis shows that Group I alphabaculoviruses and gammabaculoviruses have a lower diversity of gene content with respect to the rest of lineages. This information, coupled with the significant number of genome sequences obtained from Group I alphabaculoviruses, suggests that this lineage of viruses would constitute the newest clade in baculovirus evolution history [149]. This is based on the assumption that Group I alphabaculoviruses have had less time to incorporate new sequences from different sources (host genomes, other viral genomes, bacterial genomes, etc.) since the appearance of their common ancestor.

4. Baculovirus Core Gene Phylogeny

Traditional attempts to infer relationships between baculoviruses were performed by amino acid or nucleotide sequence analyses of single genes encoding proteins such as polyhedrin/granulin (the major component of OBs), the envelope fusion polypeptides known as F protein and GP64, or DNA polymerase protein, among many other examples [149152].

Mostly, the evolutionary inferences were in agreement with much stronger subsequent studies based on sequence analyses derived from sets of genes with homologous sequences in all baculoviruses. Thus, these new approaches were based on the construction of common-protein-concatemers which were used to propose evolution patterns for baculoviruses [149].

Then, the fact that a viral family consists of members who share a common pattern of genes and functions and whose proliferation cycle continuously challenges the viral viability turns it essential to take into account their higher or lesser tolerance to the molecular changes. Molecular constraints regarding tolerance to changes in core genes are different from those of other genes. Therefore, core genes should be considered the most ancestral genes which may have diverged in higher or lesser degrees. According to this, a phylogenetic study was performed based on concatemers obtained from multiple alignments of the 31 proteins recognized in this work as core genes for the 57 available baculoviruses with sequenced genomes (Figure 4).

The obtained cladogram reproduces the current baculovirus classification based on 4 genera. Additionally, this approach consistently separates the alphabaculoviruses into two lineages: Group I and Group II. And the same can be observed when analyzing Group I, where the presence of two different clades can be clearly inferred (clade a and clade b). These groupings result in accordance with previous reports [20, 150]. In Group II alphabaculoviruses, a clear clustering may not be identified and would not allow to suggest a subdivision.

In contrast, in the Betabaculovirus genus, it is possible to propose their separation into two different clades: clade a (XnGV, HearGV, PsunGV, SpliGV, AgseGV, and PlxyGV), and clade b (AdorGV, PhopGV, CpGV, CrleGV, PiraGV, ChocGV).

Despite the evolutionary inference based on core genes, there was a remaining question: “is the tolerance to changes in all core genes the same?”. The answer could be reached by an individual core gene variability analysis for which studies of sequence distance for each baculovirus core gene were performed (Figure 5).

The resulting order of core genes shows that pif-2 was the most conserved baculovirus ancestral sequence, whereas desmoplakin was the gene with evidence of greatest variability. This analysis reveals that genomes can be evolutionarily constrained in different ways depending on the proteins they encode.

The gain of access to new hosts might be an important force for gene evolution. During an infection process, the genome variants that appear with mutations introduced by errors in the replication/reparation machinery could be quickly incorporated into the virus population if the nucleotide changes offered a better biological performance when proteins were translated. The DNA helicase gene was considered as an important host range factor being, for this study, the second core sequence showing more variability [87]. However, other sequences like pif-2 gene would not accumulate mutations because the protein encoded might lose vital functions not necessarily associated with the nature of the host.

5. Conclusions

Baculoviridae is a large family of viruses which infect and kill insect species from different orders. The valuable applications of these viruses in several fields of life sciences encourage their constant study with the goal of understanding the molecular mechanisms involved in the generation of progeny in the appropriate cells as well as the processes by which they evolve. The establishment of solid bases to recognize their phylogenetic relationships is necessary to facilitate the generation of new knowledge and the development of better methodologies.

In view of this, many researchers have proposed and used different bioinformatic methodologies to identify genes as well as related baculoviruses. Some of them were based on gene sequences [150], gene content [17], or genome rearrangements [152]. In this work, a combination of core gene sequence and gene content analyses were applied to reevaluate Baculoviridae classification. To our knowledge, the most important fact is that this report is the first work which identifies the whole baculovirus gene content and the shared genes that are unique in different genera and subgenera. All this information should be taken into account to group and classify new virus isolates and to propose molecular methodologies to diagnose baculoviruses based on proper gene targets according to gene variability and gene content.

Acknowledgments

This work was supported by research funds from Agencia Nacional de Promoción Científica y Técnica (ANPCyT) and Universidad Nacional de Quilmes. P. D. Ghiringhelli is member of the Research Career of CONICET (Consejo Nacional de Investigaciones Científicas y Técnicas), M. N. Belaich holds a postdoctoral fellowship of CONICET, S. A. B. Miele holds a fellowship of CONICET, and M J Garavaglia holds a fellowship of CIC-PBA (Comisión de Investigaciones Científicas de la Provincia de Buenos Aires). The authors acknowledge to Lic. Javier A. Iserte, Lic. Betina I. Stephan and Lic. Laura Esteban for their helping with the paper. S. A. B. Miele and M. Javier Garavaglia both contributed equally to this work.

Supplementary Materials

The supplementary text: explains in detail alternative bioinformatic approaches used to validate the recognition of core genes. It also contains a detailed table showing the numbers of ORF homologous within the family Baculoviridae.

  1. Supplementary Material