High Performance Computation of a Jet in Crossflow by Lattice Boltzmann Based Parallel Direct Numerical Simulation
Direct numerical simulation (DNS) of a round jet in crossflow based on lattice Boltzmann method (LBM) is carried out on multi-GPU cluster. Data parallel SIMT (single instruction multiple thread) characteristic of GPU matches the parallelism of LBM well, which leads to the high efficiency of GPU on the LBM solver. With present GPU settings (6 Nvidia Tesla K20M), the present DNS simulation can be completed in several hours. A grid system of 1.5 × 108 is adopted and largest jet Reynolds number reaches 3000. The jet-to-free-stream velocity ratio is set as 3.3. The jet is orthogonal to the mainstream flow direction. The validated code shows good agreement with experiments. Vortical structures of CRVP, shear-layer vortices and horseshoe vortices, are presented and analyzed based on velocity fields and vorticity distributions. Turbulent statistical quantities of Reynolds stress are also displayed. Coherent structures are revealed in a very fine resolution based on the second invariant of the velocity gradients.
A jet in crossflow (JICF), also known as transverse jet, normally describes a jet of fluid which enters and interacts with the crossflow and the resulting flow field. JICF has wide applications in many engineering fields, such as gas turbine impingement cooling and film cooling, fuel injection in combustors, thrust vectoring system in turbojet propulsion, and reaction control for missiles. For a traditional JICF problem, near the flow field of the transverse jet, four types of vortical structures can be observed  due to the interaction between mainstream and jet as shown in Figure 1: (1) the counter-rotating vortex pair (CRVP), also known as kidney vortices; (2) the horseshoe vortices system; (3) the jet shear-layer vortices due to Kelvin-Helmholtz instabilities; (4) the wake vortices. The CRVP and the horseshoe vortices are normally defined by mean flow even though they may have unsteady part. The shear-layer vortices and the wake vortices are naturally unsteady.
Those vortical structures are very important to fluid flow and heat transfer behaviors of those fields where JICF is applied. If gas turbine film cooling is taken as an example, film coverage and the cooling effectiveness are closely relevant to those large structures, no matter whether they are defined as steady or naturally unsteady. The “steady” CRVP lifts the coolant up and mixes with the mainstream, weakening the film coverage and cooling effectiveness. The “unsteady” shear-layer vortices have different directions determined by the shape of the ejection hole and consequently either strengthen or suppress the CRVP . To accurately predict those large structures in which unsteadiness and anisotropy are inherent, a time-resolved scheme is required in the flow field calculation with fine spatial discretization.
The flow field of JICF is characterized by both anisotropic large-scale structures which break down to smaller sizes. These resultant small scale structures are isotropic and dissipation is dominant. The typical energy spectrum for the JICF inherently requires capturing the flow structures across the spectrum of all scales and thus a time and space accurate calculation is needed. Direct numerical simulation (DNS) and large eddy simulation (LES) are commonly used to resolve the turbulent characteristics of JICF in large-scale parallel computation devices. However, inherent features of Navier-Stokes equation or transport equation make the parallel processing with low efficiency. Thus, a highly parallel scheme is extremely helpful to those large-scale simulations of turbulent flow field.
Lattice Boltzmann method (LBM) has been regarded as a promising candidate for years due to its major merits of fully parallel algorithm. Unlike conventional numerical schemes based on assumption and discretizations of macroscopic continuum, the LBM adopts microscopic models . Other advantages of LBM include (1) easy treatment of boundary conditions and (2) easy programming. As a result, the LBM has its wide application in scientific and engineering research, such as Biofluid and porous medium [4–6]. However, to resolve the turbulent flow, LBM-based DNS and LES require a very high grid resolution and thus huge computation resource is needed.
Graphic Processing Unit (GPU) brings excellent hardware conditions for these simulations. With the development of computing platforms, for example, CUDA  and OpenCL, the use of GPU to accelerate nongraphic computations has drawn much more attention [8, 9]. Due to its high performance of floating-point arithmetic operation, together with wide memory bandwidth and good programmability, general-purpose GPU (GPGPU) has its huge advantage over CPU in turbulent flow simulation [10–12] and thus has been applied in fields like weather prediction, crystal growth process, and air flow in the city [13–15].
The feasibility for LBM-based DNS has some preliminary validation for turbulent flows between parallel plates ; further verification for JICF is still empty. In this paper, the lattice Boltzmann based DNS on JICF model is performed using model in multi-GPUs. With further validation of this code, this paper aims to resolve the JICF flow field, including the time-averaged and instantaneous velocities, vorticities, and turbulent quantities of Reynolds stress. The characteristic coherent hairpin structures are also revealed and discussed.
2. Lattice Boltzmann Equations
Lattice Boltzmann equation is the special form of the Boltzmann-BGK equation , in which discretization applies to velocity, time, and space. The Boltzmann equation for the discrete velocity distribution on a discrete lattice is as follows: where is the particle velocity distribution function and is the particle velocity in the th direction. Here, the scheme of LBM is described as . “” means “ dimension” and “” stands for “ lattice speeds.” For two-dimensional problems, model is most commonly used and means and 9 speeds. For three-dimensional problems, several cubic models are used, such as , , , and (, 15, 19, or 27). is the collision operator. With Boltzmann-BGK approximation,Combine (1) and (2); then it changes towhere is the local equilibrium and is the relaxation time. The equilibrium distribution form has to be correctly adopted to restore the Navier-Stokes equations. For models with the 13-, 15-, 19-, and 27-speed lattice, an appropriate distribution function is written as
In the present paper, scheme is used with particle velocity as shown in Figure 2: , , , , , , , , , , , , , , , , , , and . Corresponding weighting factors are , , and . If (3) is further discretized in both space and time , the discretized form can be written asThe above equation is the LBGK model, in which is the nondimensional relaxation time. The viscosity in the macroscopic Navier-Stokes equation can be expressed as . Equation (5) is usually solved in a standard way assuming and divided into two steps as follows.
Collision step is as follows:streaming step is as follows:where and denote the pre- and postcollision states of the distribution function. Please note that in the collision step is calculated absolutely locally and in streaming step is only relevant to its neighboring nodes. Thus, the LBM code itself is highly suitable for parallel processing and capable of obtaining high efficiency. Macroscopic quantities, such as and, can be calculated as follows:The lattice speed of sound is , and pressure can be obtained from the state equation of the ideal gas:Considering boundaries of in transverse direction (front and back boundaries), periodic boundary conditions are used in this paper. For solid walls, the following boundary conditions are applied :in which is calculated by macroscopic and on boundaries by (4). The nonequilibrium distribution on boundaries is assumed to be of the same value as that in the neighboring inner node . and on the inner node are computed by (6) and (4), respectively.
3. GPU Settings
A well-known inherent advantage of lattice Boltzmann method is its parallelism. Previous study  compared the acceleration performance between Navier-Stokes solver and current LBM solver, in which incompressible flow around a cylinder was simulated on GPU (GeForce GTX280) and CPU (Xeon E5420, 2.5 GHz) separately. For the Navier-Stokes solver in which Red-Black scheme and multi-grids were applied, the GPU acceleration over CPU was 13.7. Comparatively, for LBM on the same grid scale, the acceleration was 87.4. In addition to the acceleration performance of LBM on GPU as mentioned, the difficulty of coding is much smaller for LBM than Navier-Stokes on GPU.
However, the expense to LBM’s parallelism is also obvious due to substantial variables and consequent huge memory consumption. For the model in single precision calculation, the theoretical GPU memory requirement on 1.5 × 108 grids (1024 × 256 × 600) is over 30 G. Besides, data transfer between GPUs is realized by MPI (message passing interface) and CudaMemcpy() sentences in CUDA. From the previous experience , if the number of GPUs is less than 10, the domain partitioning is more efficient than and . In the current research, 6 GPUs of Nvidia Tesla K20M are used and the current partitioning of the GPUs is illustrated in Figure 3.
4. Physical and Numerical Models
The physical model of the jet in cross flow problem is shown in Figure 4. The computational domain is hexahedral in shape. The mainstream flow is in -direction, while the jet is orthogonal to the mainstream in -direction and ejected from the bottom plane uniformly with diameter . Origin of the coordinates is located at the center of the round jet exit. The flow domain is zero-gradient in pressure and no-slip boundary conditions are applied on the bottom plane excluding the jet exit. On transverse planes (with normal in -direction), periodic boundary conditions are applied. The inlet turbulent profile of velocity is given through a calculation procedure similar to the so-called “rescaling process” as described in . The results presented in this paper are based on flow parameters and boundary conditions listed as follows.(i)Ejecting hole diameter .(ii)Domain length .(iii)Domain span .(iv)Domain height .(v)Free-stream turbulence intensity %.(vi)Boundary layer height .(vii)Velocity ratio .
Reynolds number is defined based on jet velocity and film hole diameter as . The largest Reynolds number reaches 3,000 on grids (1024 × 256 × 600), and several calculations are performed with different dimensions, grids, and jet locations trialed as shown in Table 1. The discussion is based on the 1st case in the table. The flow domain is kept in the “low-speed” range, and velocity magnitude is smaller than 0.3 times of the lattice sound speed . Thus, the LBGK model used in current study is suitable .
5. Model Validation
Nondimensional time-averaged streamwise velocity versus is presented in Figure 5. The location is selected before the jet at and to eliminate the influence from the jet. A turbulent boundary layer profile is obtained, consisting of laminar sublayer and log layer, which is consistent with the Von Karman mixing length theory.
Further comparisons with experiments are made in regions where the mainstream is disturbed by the jet as shown in Figure 6. The experiments were done by Meyer et al. [23, 24] using stereoscopic PIV in a JICF problem. Time-averaged streamwise and spanwise velocities ( and ) are presented in spanwise direction (with respect to ) at locations of , and , . Results show good agreement with experimental data. In current simulation, a uniformly distributed velocity profile is set at the jet exit, while, in experiment, the jet is supplied through a tube and a turbulent profile has been developed inside it. Thus, an overestimation of the jet’s momentum near its boundary of the jet hole can be expected and seen in Figure 6.
6. Results and Analysis
6.1. Energy Spectra
Fast Fourier transformation (FFT) is performed to a time series of turbulent energy at the sample frequency of 2000 at , , and . The turbulent energy is defined as . The corresponding turbulent power spectrum is presented versus frequency as shown in Figure 7. The calculated turbulence decay rate is close to the theoretical canonical value of −5/3 and no obvious spectrum buildup is observed, which indicate the current mesh resolution is adequate.
6.2. Velocity Field and Vorticity
The time-averaged velocity () and instantaneous velocity () fields in the middle plane of -direction () are plotted in Figures 8(a), 8(b), and 8(c), respectively. The corresponding vorticity distributions based on the time-averaged and instantaneous velocities are also shown in the figure. In Figure 8(a), the vertical jet is bended toward exit by the mainstream flow and phenomenon like flow passing a solid obstacle is observed in the wake region. As shown in the enlarging plot of flow field, the approaching wall boundary layer meets with adverse pressure gradient before the jet and consequently separates and forms the horseshoe vortex. Considering the vorticity field, negative and positive values are observed in leading edge and trailing edge along the jet trajectory and trivial values exist in the wake region. Instantaneous velocity and vorticity fields at time steps 80000 and 120000 are shown in Figures 8(b) and 8(c), respectively. Generation and shedding of Kelvin-Helmholtz vortices are introduced by the shear layers between the mainstream and the jet on both leading edge and trailing edge. The shed vortices are mixing with the mainstream and dissipated in the downstream region quickly. In the wake region of the jet, wake vortices are observed through those nontrivial vorticity values; however, no vortex tube is directly observed in this cross section view. To compare the time-averaged fields with their instantaneous counterparts, the Kelvin-Helmholtz vortices are not available after averaging, which proves the shear-layer induced vortices are inherently unsteady and periodic.
(b) Time step = 80000
(c) Time step = 120000
The instantaneous velocity () and time-averaged velocity () fields are displayed at locations , , and , respectively, in Figure 9. The corresponding vorticity distributions are also displayed in the figure. In Figure 9(a), the instantaneous velocity vectors and vorticity distributions at time step 80000 are shown. Separation of mainstream occurs near the mid-chord portion. Further, vortex shedding and dissipation are observed in the downstream area. Wake region is formed at the backside of the jet. A characteristic flow pattern of the wake by the transverse jet is obtained which is significantly different from that by a solid cylinder as described in . In Figures 9(b) and 9(c), the time-averaged velocity vectors and vorticity distributions are displayed. Similar to flow passing a solid cylinder, along the streamwise direction, positive and negative values appear when the mainstream starts to separate. In downstream region, the vorticity recovers to trivial values due to the dissipation of those periodically-shed vortices.
The time-averaged velocity () fields are presented in several streamwise planes at locations , , and in Figures 10(a), 10(b), and 10(c), respectively, with vorticity distributions. As shown in Figures 10(a) and 10(b), the CRVP system consists of two layers, the upper one and the lower one. For a round hole, as used in the current study, the rotating directions of the two pairs are the same. This “two-deck” vortices structure is very similar to that observed in  by laser-induced fluorescence. By comparing among Figures 10(a), 10(b), and 10(c), it is found that when moving downstream, the vertical component () of the velocity reduces because of the jet’s bending. At the same time, the lower layer vortices lift in vertical direction and weaken in strength. Finally, in far downstream region (), the -component is at the same level of -component since the jet is almost fully bended and those two layers fully emerge with strength further weakened.
6.3. Turbulent Statistics
Reynolds stress distributions of , , and are plotted in the midplane of -direction () in Figures 11(a), 11(b), and 11(c), respectively. The shear stress component of the Reynolds stress tensor is shown in Figure 11(a). Strong anisotropy of the flow field can be observed in the flow field. Negative values of appear in the leading edge of the jet with positive values in the trailing portion. In the wake region, both positive and negative values exist with big nonuniformity. In Figures 11(b) and 11(c), the normal Reynolds stress is displayed. For both and , the highest values appear in the leading and trailing portions of the jet due to large velocity gradients. Right behind the jet in the recirculation zone, both components maintain some nontrivial values, especially , indicating the wrapping motion around the jet by the fluid flow.
6.4. Coherent Structure
The isosurfaces of second invariant of velocity gradient at time steps 80,000 and 120,000 with domain height of are plotted in Figure 12. The definition of is given by (11) in tensor form:where and denote the rate of rotation and rate of shearing, respectively. The hair-pin-shaped coherent structures are originated from the side and trailing edge of the ejecting hole and wrapping around the jet. In the downstream region, the hair-pin structure is weakened and dissipated into the mainstream.
(a) Time step = 80000
(b) Time step = 120000
A direct numerical simulation of jet in crossflow based on lattice Boltzmann method is performed on multi-GPUs. The current simulation takes about 10 hours based on largest grid of 1.5 × 108. Several points can be summarized and concluded as follows.(1)A turbulent boundary layer is observed before the jet (, ), consisting of laminar sublayer and turbulent core. After the jet at , and , , time-averaged streamwise and spanwise velocity components ( and ) are compared with experiments, and good agreement is obtained.(2)The turbulent energy spectrum is plotted versus frequency at , , and . A decay rate close to the theoretical value of −5/3 is observed.(3) velocity vectors and vorticity fields are plotted in -, -, and -direction planes. Unsteady shear-layer vortices are formed and shed along the leading and trailing side of the jet trajectory. Horseshoe vortex is generated in the plate boundary layer before the jet. Two-deck structure of the CRVP is retained and the lower pair is observed to rise along the streamwise direction.(4)The normal components ( and ) and shear component () of the Reynolds stress are revealed in the plane. Strong anisotropy is observed for the flow field due to the shear layer and wake.(5)Coherent structure represented by the second invariant of velocity gradient () is plotted at different time steps. The characteristic hair-pin vortices are observed.
Conflict of Interests
The authors declare that there is no any conflict of interests regarding the publication of this paper.
The work is supported by 973 Project of China (2013CB035702), the National Natural Science Foundation of China (11302165 and 11402191), and Postdoctoral Science Foundation of China (2013M540741).
Y. L. He, Y. Wang, and Q. Li, Lattice Boltzmann Method: Theory and Applications, Science Press, Beijing, China, 2009.
nVIDIA, “NVIDIA CUDA Compute Unified Device Architecture Programming Guide Version 2.0,” nVIDIA, 2008.View at: Google Scholar
I. Buck, T. Foley, and D. Horn, “Brook for GPUs: stream computing on graphics hardware,” ACM Transactions on Graphics, vol. 23, pp. 777–786, 2004.View at: Google Scholar
S. Ogawa and T. Aoki, “GPU Computing for 2-dimensional incompressible-flow simulation based on multi-grid method,” Transactions of JSCES, Article ID 20090021, 2009.View at: Google Scholar
T. Shimokawabe, T. Aoki, J. Ishida, K. Kawano, and C. Muroi, “145 TFlops performance on 3990 GPUs of TSUBAME 2.0 supercomputer for an operational weather prediction,” in Proceedings of the 11th International Conference on Computational Science (ICCS '11), vol. 4, pp. 1535–1544, June 2011.View at: Publisher Site | Google Scholar
X. Wang, Y. Shangguan, N. Onodera, H. Kobayashi, and T. Aoki, “Direct numerical simulation and large eddy simulation on a turbulent wall-bounded flow using lattice boltzmann method and multiple GPUs,” Mathematical Problems in Engineering, vol. 2014, Article ID 742432, 10 pages, 2014.View at: Publisher Site | Google Scholar | MathSciNet
D. Xu, G. Chen, X. Wang, and Y. Li, “Direct numerical simulation of the wall-bounded turbulent flow by Lattice Boltzmann method based on multi-GPUs,” Applied Mathematics and Mechanics, vol. 34, no. 9, pp. 1–9, 2013.View at: Google Scholar
Z. L. Guo and C. G. Zheng, Theory and Applications of Lattice Boltzmann Method, Science Press, Beijing, China, 2010.
K. E. Meyer, O. Özcan, P. S. Larsen, and C. H. Westergaard, “Steroscopic PIV measurements in a jet in crossflow,” in Proceedings of the 2nd International Symposium on Turbulence and Shear Flow Phenomena, Stockholm, Sweden, June 2001.View at: Google Scholar