Abstract

Although parallel computing is used in the existing numerical solutions of -body problem, tons of communications between particles render the parallel efficiency extremely low. Despite the fact that domain decomposition based on short-range interaction is used, when is exceedingly large and lots of communications exist between particles in adjacent areas, the parallel efficiency remains terribly low. This paper puts forward adjacent zero communication parallel cloud computing method for -body problem with short-range interaction domain decomposition. According to this method, the adjacent subblock data are exchanged and redundantly stored without acquiring data from other subblocks in the parallel processing, so the waiting time for data transmission can be saved and hence the parallel processing efficiency can be enhanced substantially.

1. Introduction

-body problem [1] means particles between which there is universal gravitation are given in three-dimensional space and solving the space time state of motion of these particles under the conditions of given initial position and velocity. If the planets in the universe are regarded as particles, then we can consider the motion of the planets under the universal gravitation to be an -body problem. When , it is greatly difficult or even unable to adopt analytical method (its basic principle is to present the coordinate and velocity of the celestial bodies as approximate analytical expressions in the form of series in time or other small parameters in order to discuss the changes in the celestial bodies’ coordinate or orbit along with time) or qualitative method (using the qualitative theories of differential equation to study the macrolaws and global nature of -body in the long term) to obtain solutions. The most feasible method is to adopt numerical method [2] (directly obtaining the specific position and time of the celestial bodies at a certain time via the computing methods of differential equation [3]). In the -body problem [4], the velocity and position of particles under the universal gravitation need to be recalculated at a certain time step. Therefore, the calculation of the updated velocity and displacement of each particle at each time step needs to add other particles’ interaction results on this particle, but this causes the calculation amount complexity of each time step to be . The larger the time step is, the greater the error is; the smaller the time step is, the closer it is to the reality. When is very large and the time step (TS) is very small, serial computing cannot work, since the serial computing always cannot complete the calculation whose complexity is within TS. As a result, when is very large and TS is very small, parallel computing must be used to accelerate the numerical solutions of -body problem [5, 6].

In the traditional parallel computing [79] and cloud computing [10, 11], large numbers of adjacent data blocks in complex network [12] of -body need to be exchanged, so inaccessibility of the needed data in the communications of adjacent data blocks will result in the waiting of parallel process and lower the efficiency of parallel processing. In case of using parallel computing to solve the -body problem, the calculation of the updated velocity and displacement of each particle at each time step needs to add other particles’ interaction results on this particle, so each particle needs to communicate with the remaining particles in each TS. Therefore, times of communications need to be carried out in each TS. Such frequent and large numbers of communications will greatly lower the efficiency of parallel computing, thereby depriving the advantages of parallel computing. That is why we need to find a method to reduce the amount of communication. Hence, we need to simplify the -body problem as the -body problem under the short-range interaction. Such simplification is effective, since the applied force between distant particles is weak, the long-distant effect can be neglected, and only the short-range effect is considered. Therefore, each particle only needs to communicate with the short-range particles without communicating with other particles, which significantly cuts the communication traffic. Domain decomposition is to divide the physical domain into subdomains of the same number as the processors. In each TS, each processor calculates the force, velocity, and displacement of all particles within its domain, and when the particle moves to a new subdomain, it will be distributed to a new processor. In order to calculate the force of particles within its domain, the processor only needs to know the information about the particles of the adjacent subdomains. Domain decomposition parallel algorithm is highly suitable to solve this -body problem of short-range interaction, because it leads only the adjacent domains to need mutual communication to acquire the interaction between the particles in adjacent domains.

If the quantity of the particles in each domain is still very considerable, then even if only the communications in adjacent domains are needed, the communication traffic remains heavy and still greatly affects the parallel efficiency. Therefore, this paper proposes adjacent zero communication parallel cloud computing method for adjacent data which can almost completely eliminate all communications in solving the -body problem with short-range interaction domain decomposition.

2. Adjacent Zero Communication Parallel Cloud Computing Method

The flowchart of the adjacent zero communication parallel cloud computing method is presented in Figure 1. This adjacent zero communication parallel cloud computing method includes the following steps.

S100. The input data to be processed are divided into lots of subblock data. Lots of subblock data divided from the input data redundantly store the data included in the adjacent subblock data, wherein the input data refer to the data which need relatively long processing time in a single data parallel processing, while the subblock data mean the data whose single processing is relatively simple and hence less time-consuming.

The time of the subblock data divided from the input data needed by processing via parallel processing unit shall be basically the same so that, after the parallel processing, the final results can be obtained at the fastest velocity and the wait for parallel processing unit can be avoided.

The schematic diagram of traditional data division is presented in Figure 2. The input data 10 are divided into lots of subblock data 20, each of which has adjacent data 30, wherein, when the adjacent subblock data 20 undergo parallel processing, it is necessary to obtain adjacent data 30.

The division schematic diagram of adjacent zero communication parallel cloud computing method is presented in Figure 3. Two units of adjacent subblock data (202 and 204) are used to explain redundant storage. In the traditional division ways, the subblock data 202 has adjacent data (a) and the subblock data 204 has adjacent data (b). The subblock data 202 redundantly stores the adjacent data (b) and the subblock data 204 in adjacent data (a). That is, both the subblock data 202 and the subblock data 204 include the adjacent data units (a) and (b). The adjacent data units (a) and (b) are used in their respective parallel processing in the subblock data units 202 and 204.

The ways of realizing the redundant storage include data exchange after redundant segmentation and nonredundant segmentation.

The redundant segmentation means that, in data segmentation, the segmented boundary is extended to the preset width so that the data included in other subblock data can be included. Data segmentation adopts the ways such as file segmentation, datasheet segmentation, and data matrix segmentation.

Nonredundant segmentation means that the master data are segmented in accordance with the traditional data segmentation ways and that there is no data redundancy between the segmented subblock data. Similarly, the nonredundant segmentation can adopt the ways such as file segmentation, datasheet segmentation, and data matrix segmentation. Subsequently, each subblock data unit transmits and exchanges the adjacent data and integrates such data into its own data, wherein data exchange can adopt message passing technology and file transfer technology.

S200. Lots of subblock data undergo parallel processing. After obtaining their respective redundantly stored subblock data, the parallel processing units carry out parallel processing.

In the parallel data processing method, since the subblock data redundantly store the data needed by other subblock data in the parallel processing, they need not obtain data from other subblock data in the parallel processing and thus the waiting time for data transmission can be saved and hence the parallel processing efficiency can be enhanced.

3. Adjacent Zero Communication Parallel Cloud Computing System for -Body Problem with Short-Range Interaction Domain Decomposition

Although the particles in domains in the -body problem with short-range interaction domain decomposition will transfer from one domain to another at different time steps, the majority of particles in each domain experience no significant changes at adjacent time steps, so the communication traffic caused by the transfer of particles in different domains is relatively light. The current time step only needs to update in the current domain the changed particles and their status in the current domain and its adjacent domains, so the communication traffic needed by the current time step is only the communication traffic consumed by the particle transfer and its redundant backup communication in its adjacent domains. Since this kind of communication traffic is extremely light, it is negligible compared with the communication traffic produced by solution of -body problem with short-range interaction domain decomposition before adopting the adjacent zero communication parallel cloud computing method.

The parallel processing system is indicated in Figure 4. This system includes data segmentation module 100 and parallel processing unit 200.

The data segmentation module 100 conducts redundant segmentation of the input data, allowing each subblock data unit to redundantly store the data included in its adjacent subblock data, wherein redundant segmentation means extending the segmented boundary to the preset width in data segmentation. Data segmentation adopts the ways such as file segmentation, datasheet segmentation, and data matrix segmentation.

The parallel processing unit 200 accepts scheduling and conducts parallel processing of the subblock data. The parallel processing unit 300 is used to conduct parallel computing, distributed computing, network computing, grid computing, cloud computing, or sea computing.

The parallel processing system is indicated in Figure 5. This system includes data segmentation module 100′, data exchange module 200′, and parallel processing unit 300. The data segmentation module 100′ conducts nonredundant segmentation of the master data, and the nonredundant segmentation means that the master data are segmented in accordance with the traditional data segmentation ways and that there is no data redundancy between the segmented subblock data. The segmentation ways can adopt file segmentation, datasheet segmentation, and data matrix segmentation.

The data exchange module 200′ exchanges and redundantly stores the adjacent subblock data. Each subblock data unit transmits and exchanges the adjacent data and integrates such data into its own data, where data exchange can adopt message passing technology and file transfer technology.

The parallel processing unit 300 accepts scheduling and conducts parallel processing of the subblock data. The parallel processing unit 300 is used to conduct parallel computing, distributed computing, network computing, grid computing, cloud computing, or sea computing.

4. Results

The time step of the numerical solutions of -body problem is TS. -body is divided into domains, each of which has fields; the number of particles transferred in each TS is about ; the space time status of the -body at is solved, wherein is generally small, such as 0.01%, and is also generally small, such as 4 or 6.

The parallel cloud computing method not using the domain decomposition method (NDPC) is as follows:

The parallel cloud computing method not using the adjacent zero communication but using the domain decomposition method (NZDPC) is as follows:

Parallel cloud computing method using the adjacent zero communication and using the domain decomposition method (ZDPC) is as follows:

It can be observed from Table 1 that NDPC-EV > NZDPC-EV > ZDPC-EV, NDPC-TV > NZDPC-TV > ZDPC-TV, and when is extremely large and is extremely small, the gap is larger so that the enormous advantage of parallel cloud computing ZDPC using the adjacent zero communication and the domain decomposition method can be demonstrated.

For example, = , TS = 1 ms, , = 6, = , and =  ms.

NDPC is as follows:;;

NZDPC is as follows:;;

ZDPC is as follows:;.

According to Figure 6, the communication traffic of the parallel cloud computing method using the adjacent zero communication (ZDPC) and the domain decomposition method is the least, significantly more advantageous than the parallel cloud computing method not using the domain decomposition method (NDPC) and the parallel cloud computing method not using the adjacent zero communication but using the domain decomposition method (NZDPC).

5. Conclusions

The adjacent zero communication parallel cloud computing method proposed in this paper segments the input data to be processed into lots of subblock data, each of which redundantly stores the data in its adjacent subblock data, and conducts parallel processing of lots of subblock data. This paper puts forward adjacent zero communication parallel cloud computing method for -body problem with short-range interaction domain decomposition. According to this method, the adjacent subblock data are exchanged and redundantly stored without acquiring data from other subblocks in the parallel processing, so the waiting time for data transmission can be saved and hence the parallel processing efficiency can be enhanced substantially, especially considerably boosting the parallel processing efficiency of -body problem with short-range interaction domain decomposition.

Competing Interests

The author declares that he has no competing interests.

Acknowledgments

This research was supported by Major Project of Guangdong Province under Grant no. 2014B090901064, Project of Guangdong Province under Grant no. 2015A010103013, Major Project of National Social Science Fund under Grant no. 14ZDB101, and National Natural Science Foundation of China under Grant no. 61105133.