#### Abstract

Viruses are infectious agents that can cause epidemics and pandemics. The understanding of virus formation, evolution, stability, and interaction with host cells is of great importance to the scientific community and public health. Typically, a virus complex in association with its aquatic environment poses a fabulous challenge to theoretical description and prediction. In this work, we propose a differential geometry-based multiscale paradigm to model complex biomolecule systems. In our approach, the differential geometry theory of surfaces and geometric measure theory are employed as a natural means to couple the macroscopic continuum domain of the fluid mechanical description of the aquatic environment from the microscopic discrete domain of the atomistic description of the biomolecule. A multiscale action functional is constructed as a unified framework to derive the governing equations for the dynamics of different scales. We show that the classical Navier-Stokes equation for the fluid dynamics and Newton's equation for the molecular dynamics can be derived from the least action principle. These equations are coupled through the continuum-discrete interface whose dynamics is governed by potential driven geometric flows.

#### 1. Introduction

Viruses are omnipresent infectious agents that are about 100 times smaller than bacteria. Unlike bacteria, viruses are not able to grow or reproduce outside a host cell [1–4]. There are more than 5000 types of known viruses. Viruses have a known history of causing epidemics and pandemics. About 70% of native Americans were killed by foreign diseases after the arrival of Columbus in the Americas. The Spanish flu pandemic lasted from 1918 to 1919 and killed about 100 million people, or 5% of the world's population in 1918. AIDS, a disease due to HIV virus, has killed more than 25 million people since it was first recognized on June 5, 1981. There are about 39 million people living with HIV viruses worldwide nowadays. Virus infection processes or virus life cycles differ greatly among species but there are six basic stages: [1–4] () selective attachment due to the interaction, binding and/or fusion between viral capsid surface and specific receptors on the host cellular surface, () penetration of a virus into a host cell through membrane fusion or receptor-mediated endocytosis, () viral genomic nucleic acid releasing in the host cell due to viral capsid degradation by viral enzymes or host enzymes, () virus replication and assembly in the host cell, () Posttranslational modification of the viral proteins; and finally, () virus releasing from the host cell. For some viruses, such as HIV, the order of stages () and () is reversed. Body uses two defense mechanisms, innate immune system and cell-mediated immunity to defend host from infection by viruses or other organisms. The innate immune system terminates the virus replication in the host cell by degrading or inhibiting the virus genetic material, DNA or RNA through antibodies or other virus DNA/RNA binding molecules. In the cell-mediated immunity, killer cells known as T cells destroy the infected host cell and its close neighbors by recognizing the viral protein displayed on the cellular surface.

Recent advances in structural biology and microbiology have led to a rapidly growing body of virus structural data [5–7]. A striking feature of virus data is that they are excessively large—a virus complex may involve tens of millions atoms, with detailed information on atomic coordinates, types, and radii. Most virus structural data are collected via X-ray diffraction (X-ray), cryo-electron microscopy (cryoEM) [8], fiber diffraction, and nuclear magnetic resonance (NMR) techniques. There are a few major virus morphologies: spherical type, helical type, dihedral type, viral envelope type, and complex type. Most animal viruses are of spherical morphology with icosahedral symmetry [6]. Most virus structure information can be obtained from the Protein Data Bank (PDB; http://www.rcsb.org/pdb/home/home.do), the Virus Particle Explorer database [7] (VIPERdb; http://viperdb.scripps.edu/), and the Protein Quaternary Structure server (PQS; http://pqs.ebi.ac.uk/).

Currently, the prevention and control of epidemics and pandemics caused by infective viruses, such as H1N1, HIV, SARS, and bird flu are of paramount importance. As an infection starts with the surface attachment between a virus and a host cell, it is important to construct and visualize the surface topology and morphology of viruses in order to understand the surface attachment and further interaction. This information is also crucial to the understanding of the molecular mechanism that gives rise to the assembly of virus capsids and DNA or RNA packaging. Computer-based visualization is able to represent results of explorations in an easy-to-comprehend form and to facilitate convenient information retrieval. Currently, visualization tools are often developed in close conjunction with imaging, data registration, simulation and/or surface construction. Virus visualization plays a unique role in the understanding of virus infection processes, such as, virus attachment of a host cell, binding and fusion between a virus capsid surface and a host cellular surface, and the penetration of a virus into a host cell. However, viruses are not directly visible because their sizes are at the order of tens of nanometers. The virus images are constructed from virus information, which is either collected from modalities described above or generated by computer simulations. Therefore, surface/image construction is a part of the virus visualization. Yu and Bajaj present a computational algorithm to segment asymmetric units of three-dimensional (3D) density maps of icosahedral viruses [9] and a computational approach to structural interpretation from reconstructed 3D electron microscopy (3D-EM) maps of viruses [10]. Some basic biomolecular surface methods are available in visualization software packages Chimera (http://www.cgl.ucsf.edu/chimera/) and VMD (http://www.ks.uiuc.edu/Research/vmd/).

The difficulty of characterizing a virus complex is not only its massive number of atoms, or data sets, but also its everlasting interactions. Except for envelope type of viruses which typically cover their capsids by envelopes derived from lipids and proteins of their host cell membranes, most viruses use their own capsids to interact with the environment and host cells. A viral capsid usually consists of many identical viral protein subunits that form the capsid by symmetric assembly. There are strong interactions between viral protein subunits so that viral capsids are rigid enough to hold viral genome material and protect its content. Viruses have adapted a number of strategies to maintain the stability and flexibility of viral capsids. For many small viruses, such as one of STMV, their subunit proteins generally only touch each other by their edges. Their capsid stability is achieved by strong nonbonding interactions (i.e., hydrogen bonding and van der Waals interactions) between edges of subunit proteins. Some large viruses, such as BMV, have developed overlapping strategies to increase the capsid stability. Some viruses even use a few intricately intertwining layers to strengthen their capsids [11]. Virus capsids are further stabilized by their hydrophobic interaction with the aquatic environment. Clearly the boundary profile of the virus complex is determined by the balance of all mechanical forces or equivalently, the energy minimization of the system.

One of the present authors, Wei, introduced some of the first high-order geometric flow equations for image analysis [12]. These equations have led to many applications [12–16]. Mathematical analysis of Wei's equations has been recently carried out in Sobolev space by Bertozzi and Greer [17–19], who proved the existence and uniqueness of the solution to a case with initial data and a regularized operator. Coupled geometric flow equations were introduced by Wei and Jia for image edge detection [13]. An evolution operator based single-step method was proposed by Wei, Wang and their coworkers for image processing [14]. A partial differential equation approach of Connolly surfaces was proposed by Wei and his coworkers [20]. In such an approach, geometric partial differential equation (PDE) is used to describe the solvent density flows. Most biological processes occur in water, which consists of about 70% body mass. Therefore, in general, the biomolecular surface morphology should be determined by the free energy minimization in the aquatic environment. Wei and his coworkers have addressed this question by considering a mean curvature flow model of bimolecular surfaces that minimize the surface-free energy functional [21]. They have also recently introduced stochastic geometric flows to account for the random fluctuation and dissipation in density and pressure near the surface [22]. A general geometric flow structure, the potential driven geometric flows, was introduced [22]. Physical properties, such as free energy minimization (area decreasing) and incompressibility (volume preserving), were realized in new geometric evolution equations [22]. Computational techniques used in this surface analysis are quite similar to the level sets devised by Osher and Sethian [23–25]. Another efficient approach is the Euler-Lagrange formulation of surface variation developed by Chan and others, [26, 27]. Interacting particle systems and point-based approaches have also been proposed for the modeling and animation of surfaces [28].

An unsolved problem in structural virology is the detailed molecular mechanism of the assembly of virus capsids with the right size that is able to accommodate virus genetic material in the subsequent virus DNA/RNA packaging. Additionally, the process of virus attachment on its host cell, the movement of virus fusion with cellular membrane, and the dynamics of virus penetration into its host cell remain unrevealed mysteries. Prerequisites to unveiling these mysteries are efficient computer science and mathematical tools for modeling virus surface construction, evolution, and visualization, and for analyzing the virus interactions with its host cell. A typical virus has millions of atoms, while a large virus may have tens of millions atoms. Huge viral data sets pose severe challenges to the theoretical understanding and prediction of virus dynamics and interactions. These challenges are considerably exacerbated by the fact that virus behavior and infectivity depend strongly on the physiological environment, where the water molecules are the most common media. This dramatically increases the number of degrees of freedom of a virus system. The real-time dynamic visualization of viral attachment, fusion, and penetration of a host cell in the aquatic environment requires microsecond or even millisecond simulation time and is technically intractable with full-atom models at present [11, 29]. In fact, the elementary operations, that is, the construction of virus surfaces with physical models and real-time visualization of virus morphology present formidable challenges for applied mathematics and computer science.

Recently, one of the present authors, Wei, has developed a differential geometry-based multiscale paradigm to address some of the aforementioned challenges in the nonequilibrium dynamics of viruses, as well as other complex chemical systems, for example, fuel/solar cells, and biological systems, for example, ion channels [30]. In this approach, the differential geometry theory of surfaces and geometric measure theory are employed to couple the macroscopic continuum mechanical description of the aquatic environment with the microscopic discrete atomistic description of the macromolecule. Multiscale action functionals are constructed as a unified framework to derive the governing equations for the dynamics of different scales and different descriptions. The generalized Navier-Stokes equation for the fluid dynamics, the generalized Poisson Boltzmann equation for electrostatic interactions, and Newton's equation for the molecular dynamics were derived by the least action principle. These equations are coupled through the micro-macro boundary whose dynamics is governed by potential driven geometric flows.

The objective of the present work is threefold. First, we apply the differential geometry-based multiscale models to the formation and evolution virus capsids where challenges originated from a large number of atoms and a variety of interactions in a virus system, including the aquatic environment. To dramatically reduce the number of degrees of freedom of a virus system, we treat the water molecules as a macroscopic continuum. However, we maintain atomic description of the virus to allow an optimal access to detailed biomolecular information. Secondly, we propose a new scale, the coarse-grained particles, to improve the earlier multiscale formalism [30]. Our new coarse-grained scale is based on the description of amino acid residues. This additional scale is necessary for excessively large viruses or macromolecules. It efficiently reduces the number of degrees of freedom. Finally, to further reduce computational cost, we utilize virus symmetries to provide an optional reduction in data size. Viruses typically have a few coding genes and they make use of symmetries to reduce their genome size, because capsid genes are repeatedly used. Apparently, viruses also try to make use of symmetry so that they have a high ratio of volume over surface area. As such, virus can maintain the desirable mechanical and chemical stability while without their own cell membranes and complex defense systems. Some of the proposed ideas are tested by their applications in virus surface formation, evolution, and visualization.

#### 2. Theory and Algorithms

In this section, differential geometry theory of surfaces and potential driven geometric flows are utilized to establish a multiscale paradigm for modeling and simulation of virus formation and evolution. Then, a coarse-grained virus model is formulated to further reduce the number of degrees of freedom. Finally, the use of symmetry in virus surface construction is discussed.

##### 2.1. Differential Geometry-Based Multiscale Model

###### 2.1.1. Multiscale Models of Virus Surface Formation and Evolution

A fundamental issue in biological modeling, and in data analysis, visualization, and dynamical representation is how to deal with a tremendously large number of degrees of freedom resulting from various interaction. Under physiological condition, a virus and its interacting environment may involve tens of millions of protein atoms and water molecules. In principle, the system can be described entirely in the microscopic scale, that is, atomistic description or more detailed description of electrons and nuclei. However, such an approach cannot be productive and does not provide theoretical predictions of physical properties of the virus complex. It is impossible at present, and formidably expensive in near future to describe in full-atomic detail of all the aforementioned interactions for a large virus system. On the other hand, a macroscopic description of the system is incapable of revealing the molecular and atomic information of the virus particle and its dynamics. We plan to reduce the number of degrees of freedom of the virus complex by a differential geometry-based multiscale model. In our multiscale model, we will describe the aquatic environment by a hydrodynamic continuum, that is, a macroscopic description. As such, we are able to dramatically reduce the number of degrees of freedom of millions surrounding water molecules. However, since the biomolecule or the virus is the objective of interest, we will describe the virus in atomic detail, that is, a microscopic, discrete description. Additionally, we carefully consider the solvation process of the virus molecule. The virus surface tension and mechanical work of virus immersion into the solvent are considered in our model, in addition to the possible interaction between virus atoms and the aquatic environment. Finally, the force resulted from virus and solvent interactions is accounted by fluid motion, which is modeled by a viscous fluid.

In our differential geometry-based multiscale model, we use a hypersurface (characteristic) function to characterize the boundary of the virus and solvent. As such, indicates the virus domain and (i.e., ) indicates the aquatic domain. However, at atomic scale, the virus surface, or the flow boundary between the virus particle and aquatic environment cannot behave like the Heaviside function. Instead, it must take a value between zero and one (). Such a profile characterizes the boundary between the virus and the aquatic environment. In the rest of this section, we set as the macroscopic variable and as the microscopic variable of discrete atoms or particles. The domain of the solvent is denoted as and that of the virus molecule is denoted as . The whole computational domain is . The solvent-solute boundary is .

We consider the total action functional for the virus complex [30] where is the electrostatic potential, is the surface tension, is the pressure, is the interaction potential between the solvent and the solute, is the Boltzmann constant, is the temperature, is the bulk concentration of th ionic species, is the number of ionic species, and is the canonical density of molecular free charges, with being partial charges on (discrete) atoms. Here, and are the permittivities of the macromolecule and the solvent, respectively, where is the permittivity of vacuum, and and are relative permittivities. We treat as constants. Additionally, and are mass densities of the solvent and virus atom (or coarse-grained particle), respectively. Finally, is the fluid velocity, is the viscosity of the fluid, symbol in superscript denotes the transpose, is the velocity of the th atom, and is the interaction potential for atoms.

On the right hand side of (1), the first row is the nonpolar solvation free energy, which includes the surface area effect , the mechanical work (the volume effect ), and the solvent-solute interactions . In principle, these interactions take care of important dispersion effects, and other van der Waals effects. Geometric measure theory is used to come up with the expression for the surface area. The second row is the electrostatic polar solvation free energy, which has contributions from the virus particle and the aquatic solvent . Here, the virus particles contribute a set of discrete partial charges while the ion charges in the solvent are treated as a continuous Boltzmann distribution. This is valid as long as the system is near equilibrium. For systems far from equilibrium, alternative models, such as Poisson-Nernst-Planck (PNP) equations, are required to describe the density of ionic species [30]. The third row is the Lagrangian of the fluid dynamics subsystem with a negative sign. It consists of the kinetic energy of the fluid flow and the generalized potential energy. The latter includes pressure and stress energy . The stress energy represents the energy loss due to the interactions among the fluid particles, which are not explicitly described in the present model. The exact expression of the stress tensor for real fluid is usually unknown. Newtonian fluid and NonNewtonian fluid approximations are commonly used, in addition to numerous other approximations. Finally the last row contains the Lagrangian of the virus molecular dynamics subsystem with a negative sign. It describes the kinetic energy and potential energy . The latter includes all possible potential interactions among virus atoms or coarse-grained particles. We have chosen negative signs for two Lagrangians so that the potential energies have positive signs and are consistent with other potential energies.

###### 2.1.2. Governing Equations for Coupled Fluid Dynamics and Molecular Dynamics

In the present work, we derive four governing equations by employing the principle of the least action to the total action functional () in (1) with respect to four variables () Here, , , , and are four infinitesimally small but nonzero perturbations. In order for the first variation to vanish, the terms associated , , , and have to vanish independently. First, the term associated with gives rise to a generalized Poisson-Boltzmann equation where provides a smooth dielectric profile near the interface. This is a new Poisson-Boltzmann equation for overlapping domains. With the sharp interface, limit, (3) reduces to the standard Poisson-Boltzmann equation [31–35] and appropriate interface conditions where and are, respectively, the virus domain and the solvent domain, is the sharp interface and is the normal vector of the surface.

Additionally, the virus surface evolution equation can be constructed by requiring the term associated with in (1) to vanish, followed by the use of the steepest descent scheme The structure of this equation is very similar to the potential driven geometric flows introduced in the earlier work [22, 30] where includes appropriate potential interaction terms. Therefore, (7) can be solved by using the same procedure as that described in the earlier work [22].

Moreover, the requirement of the vanishing of the term associated with gives rise to a generalized Navier-Stokes equation for continuum fluid dynamics [30] where the stress tensor is given by The Newtonian fluid is assumed in the present work. The force in (9) is given by Here, the force includes a few components defined as The detailed derivation of the generalized Navier-Stokes equation can be found in [30]. In case of sharp solvent-virus interfaces, the hypersurface function becomes a step function, and (9) reduces to the standard Navier-Stokes equation with simplified force expressions.

Finally, the Newton's equation for molecular dynamics of the th atom (particle) in the virus is derived from the term associated with , Here the microscopic force associated with the th atom is The force components are defined as

where , and are, respectively, solvent-solute interaction force, reaction field force, and potential interaction force.

In this multiscale system, all forces are balanced. The fluid dynamics, the molecular dynamics, the electrostatic subsystem, and the hypersurface function are all coupled.

##### 2.2. Coarse-Grained Model

As a part of our multiscale framework, we consider a coarse-grained formalism for viral surface formation and evolution. Coarse-grained models are often used to deal with exceptionally large biological systems. In the present treatment, we consider each amino acid residue as a particle, located at the position. The radii of twenty standard amino acid residues used in the present work are listed in Table 1. Coarse-grained representations are efficient approaches for data size reduction. Combined with enhanced computer power and efficient computational algorithms, coarse-grained approaches currently enable the simulation of systems of biologically relevant size (submicrometric) and timescale (microsecond or millisecond) [29]. Although coarse-grained models cannot be considered as predictive as all-atom ones, they can provide much insight with the use of more rigorous parameterization techniques and efficient algorithms for sampling configurational space. Since the simulation size and timescale of coarse-grained models coincide with those that can be reached with the most advanced spectroscopic techniques, it is possible to directly compare experiment data and simulation predictions. In this work, we will explore the use of coarse-grained models for viral surface formation and evolution. Figure 1 presents an illustration of coarse-graining particles for a viral protein subunit. The original full-atomic subunit of the Nodamura virus has about 10 thousand atoms. In the coarse grained representation, each amino acid residue is considered as one particle, located at the position of the original atom. Each type of amino acid residues has a particle radius as shown in Table 1. The discrete-continuum model of viral surface representation discussed above is still applicable to the present coarse-grain-continuum setting. However, to use (7) for viral surface formation and evolution, we need to redefine the Lennard-Jones and Coulomb potential parameters to describe the interaction between amino-acid-residue particles.

##### 2.3. Viral Data Size Reduction by Symmetry

###### 2.3.1. Symmetry in Virus Capsids

Viral data may involve tens of millions of atomic coordinates and radii, and are enormously large for structural modeling, simulation and visualization. Viral dynamical cycles may last from millisecond to days, and real-time full-atom viral dynamical simulations of viruses are intractable to the present computational capability [11]. However, viruses typically have very small genomes and code a few proteins. In order for a viral capsid to hold all viral genetic material, virus makes use of symmetry in its capsid assembly. Amazingly, most viruses are symmetric, having icosahedral, helical, dihedral, or circular symmetries [6]. As such, an icosahedral virus can self-organize one protein to generate a capsid of 60 symmetry-related subunits (some viruses code hundreds of proteins). Therefore, it is desirable to take advantage of symmetry in viral data analysis, operation, and management. In particular, we propose to make use of viral symmetries, if they are available in our geometric flow based viral surface formation and evolution. Additionally, we can detect partial and approximate symmetry [36] from viral surfaces, and enforce symmetrization [37]. As such, we will use geometric flows to generate symmetric facets, or patches from viral protein subunits, and construct the whole viral surface by symmetric assembly of viral facets; see Figure 2 for an illustration.

###### 2.3.2. Virus Symmetry Transformation

Viruses have adapted five point group symmetries, that is, circular, dihedral, tetrahedral, octahedral, and icosahedral, in their biological assemblies. Mathematically, only three types of symmetric operations, that is, rotation, inversion, and translation are involved. Starting with the basic set of coordinates of a protein subunit, the virus capsid data can be obtained by the transformation where are rotational (or inversion) elements, and are translational elements. The viral data deposited in the Protein Data Bank (PDB) often have problems with missing sets of transformation operations and erroneous coordinate-frame representations. We make corrections by using the Virus Particle Explorer database [7] (VIPERdb; http://viperdb.scripps.edu/) and/or the Protein Quaternary Structure server (PQS; http://pqs.ebi.ac.uk/).

#### 3. Numerical Demonstration

Recent advances in structural biology and microbiology have given rise to an increasing body of structural data for over 300 viruses and viral complexes. Quaternary structures of viruses and viral complexes pose many challenges for viral representation, visualization, and the analysis of virus stability and interaction [6]. The proposed multiscale framework can be studied on a wide range of test cases to demonstrate its utility and usefulness to the research community. However, a full-scale demonstration of the proposed multiscale model is a rather computationally challenging task as it involves computational fluid dynamics (CFD), molecular dynamics (MD) of viruses, and surface dynamics of large systems. In this paper, we should primarily focus on the virus surface formation and evolution. The coupling of the surface dynamics to the CFD and MD will be studied in our future work and published elsewhere.

We also test two other proposed ideas in this work, that is, the coarse-grained virus model and the use of symmetry assembly for the virus surface construction. In particular, we are interested in examining the effect of the symmetry assembly on the virus surface visualization. As shown in Figure 3, we consider the coarse-grained model, which is an efficient way to reduce computational cost. Additionally, we test the surface construction by using symmetric assembly. In comparison with surfaces constructed by potential driven geometric flows without using the symmetry (Lower row), geometric flow surfaces constructed by symmetry (Upper row) provide a good representation of the original surfaces. However, one can still see that contact edges in the surfaces constructed by symmetry are not very smooth. Moreover, we expect some impact of symmetric assembly to the MD and fluid dynamics, as the symmetry becomes an additional constraint to virus dynamical motions. The soundness of such a constraint needs to be studied. This aspect as well as many other ideas proposed in this work will be further explored elsewhere.

#### 4. Concluding Remarks

The control of infective viruses released by terrorists, and the prevention of viral epidemics and pandemics, such as HIV, SARS, H1N1, and bird flu are of tremendous importance. The understanding of viral surface formation, evolution, viral attachment and penetration of host cells are prerequisites to viral disease prevention and control. This problem, as well as many other similar problems in molecular biology, poses pressing challenges to the theoretical community due to their large number of degrees of freedom. The main purpose of the present work is to introduce a differential geometry-based multiscale framework to handle complex biological systems. The present multiscale model couples macroscopic fluid dynamics, microscopic molecular dynamics, and surface dynamics in a unified framework. The differential geometry theory of surfaces is utilized to put continuum description and discrete description in an equal footing. The present work constructs a generalized action functional to self-consistently couple different scales. Governing equations for the fluid dynamics, that is, the generalized Navier-Stokes equation, and molecular dynamics, that is, the Newton's equation, are derived by minimizing the action functional. Additionally, we make use of viral symmetry to dramatically reduce viral data sizes and improve viral visualization. Finally, some of the proposed approaches are demonstrated by the generation of a few virus surfaces.

The proposed differential geometry-based multiscale model can be easily generalized to complex systems with multiple interfaces or many biomolecules. Additionally, the incorporation of continuum solid description into the present model will be published elsewhere. Finally, the inclusion of a quantum mechanical description can also be pursued in a similar way and will be published elsewhere. Numerical experiments that further demonstrate the proposed ideas are under our consideration.

#### Acknowledgments

This work was supported in part by NSF Grants DMS-0616704 and CCF-0936830, and NIH Grants CA-127189 and GM-090208.