Research Article  Open Access
High Performance Computation of a Jet in Crossflow by Lattice Boltzmann Based Parallel Direct Numerical Simulation
Abstract
Direct numerical simulation (DNS) of a round jet in crossflow based on lattice Boltzmann method (LBM) is carried out on multiGPU cluster. Data parallel SIMT (single instruction multiple thread) characteristic of GPU matches the parallelism of LBM well, which leads to the high efficiency of GPU on the LBM solver. With present GPU settings (6 Nvidia Tesla K20M), the present DNS simulation can be completed in several hours. A grid system of 1.5 × 10^{8} is adopted and largest jet Reynolds number reaches 3000. The jettofreestream velocity ratio is set as 3.3. The jet is orthogonal to the mainstream flow direction. The validated code shows good agreement with experiments. Vortical structures of CRVP, shearlayer vortices and horseshoe vortices, are presented and analyzed based on velocity fields and vorticity distributions. Turbulent statistical quantities of Reynolds stress are also displayed. Coherent structures are revealed in a very fine resolution based on the second invariant of the velocity gradients.
1. Introduction
A jet in crossflow (JICF), also known as transverse jet, normally describes a jet of fluid which enters and interacts with the crossflow and the resulting flow field. JICF has wide applications in many engineering fields, such as gas turbine impingement cooling and film cooling, fuel injection in combustors, thrust vectoring system in turbojet propulsion, and reaction control for missiles. For a traditional JICF problem, near the flow field of the transverse jet, four types of vortical structures can be observed [1] due to the interaction between mainstream and jet as shown in Figure 1: (1) the counterrotating vortex pair (CRVP), also known as kidney vortices; (2) the horseshoe vortices system; (3) the jet shearlayer vortices due to KelvinHelmholtz instabilities; (4) the wake vortices. The CRVP and the horseshoe vortices are normally defined by mean flow even though they may have unsteady part. The shearlayer vortices and the wake vortices are naturally unsteady.
Those vortical structures are very important to fluid flow and heat transfer behaviors of those fields where JICF is applied. If gas turbine film cooling is taken as an example, film coverage and the cooling effectiveness are closely relevant to those large structures, no matter whether they are defined as steady or naturally unsteady. The “steady” CRVP lifts the coolant up and mixes with the mainstream, weakening the film coverage and cooling effectiveness. The “unsteady” shearlayer vortices have different directions determined by the shape of the ejection hole and consequently either strengthen or suppress the CRVP [2]. To accurately predict those large structures in which unsteadiness and anisotropy are inherent, a timeresolved scheme is required in the flow field calculation with fine spatial discretization.
The flow field of JICF is characterized by both anisotropic largescale structures which break down to smaller sizes. These resultant small scale structures are isotropic and dissipation is dominant. The typical energy spectrum for the JICF inherently requires capturing the flow structures across the spectrum of all scales and thus a time and space accurate calculation is needed. Direct numerical simulation (DNS) and large eddy simulation (LES) are commonly used to resolve the turbulent characteristics of JICF in largescale parallel computation devices. However, inherent features of NavierStokes equation or transport equation make the parallel processing with low efficiency. Thus, a highly parallel scheme is extremely helpful to those largescale simulations of turbulent flow field.
Lattice Boltzmann method (LBM) has been regarded as a promising candidate for years due to its major merits of fully parallel algorithm. Unlike conventional numerical schemes based on assumption and discretizations of macroscopic continuum, the LBM adopts microscopic models [3]. Other advantages of LBM include (1) easy treatment of boundary conditions and (2) easy programming. As a result, the LBM has its wide application in scientific and engineering research, such as Biofluid and porous medium [4–6]. However, to resolve the turbulent flow, LBMbased DNS and LES require a very high grid resolution and thus huge computation resource is needed.
Graphic Processing Unit (GPU) brings excellent hardware conditions for these simulations. With the development of computing platforms, for example, CUDA [7] and OpenCL, the use of GPU to accelerate nongraphic computations has drawn much more attention [8, 9]. Due to its high performance of floatingpoint arithmetic operation, together with wide memory bandwidth and good programmability, generalpurpose GPU (GPGPU) has its huge advantage over CPU in turbulent flow simulation [10–12] and thus has been applied in fields like weather prediction, crystal growth process, and air flow in the city [13–15].
The feasibility for LBMbased DNS has some preliminary validation for turbulent flows between parallel plates [16]; further verification for JICF is still empty. In this paper, the lattice Boltzmann based DNS on JICF model is performed using model in multiGPUs. With further validation of this code, this paper aims to resolve the JICF flow field, including the timeaveraged and instantaneous velocities, vorticities, and turbulent quantities of Reynolds stress. The characteristic coherent hairpin structures are also revealed and discussed.
2. Lattice Boltzmann Equations
Lattice Boltzmann equation is the special form of the BoltzmannBGK equation [17], in which discretization applies to velocity, time, and space. The Boltzmann equation for the discrete velocity distribution on a discrete lattice is as follows: where is the particle velocity distribution function and is the particle velocity in the th direction. Here, the scheme of LBM is described as [18]. “” means “ dimension” and “” stands for “ lattice speeds.” For twodimensional problems, model is most commonly used and means and 9 speeds. For threedimensional problems, several cubic models are used, such as , , , and (, 15, 19, or 27). is the collision operator. With BoltzmannBGK approximation,Combine (1) and (2); then it changes towhere is the local equilibrium and is the relaxation time. The equilibrium distribution form has to be correctly adopted to restore the NavierStokes equations. For models with the 13, 15, 19, and 27speed lattice, an appropriate distribution function is written as
In the present paper, scheme is used with particle velocity as shown in Figure 2: , , , , , , , , , , , , , , , , , , and . Corresponding weighting factors are , , and . If (3) is further discretized in both space and time , the discretized form can be written asThe above equation is the LBGK model, in which is the nondimensional relaxation time. The viscosity in the macroscopic NavierStokes equation can be expressed as . Equation (5) is usually solved in a standard way assuming and divided into two steps as follows.
Collision step is as follows:streaming step is as follows:where and denote the pre and postcollision states of the distribution function. Please note that in the collision step is calculated absolutely locally and in streaming step is only relevant to its neighboring nodes. Thus, the LBM code itself is highly suitable for parallel processing and capable of obtaining high efficiency. Macroscopic quantities, such as and, can be calculated as follows:The lattice speed of sound is , and pressure can be obtained from the state equation of the ideal gas:Considering boundaries of in transverse direction (front and back boundaries), periodic boundary conditions are used in this paper. For solid walls, the following boundary conditions are applied [19]:in which is calculated by macroscopic and on boundaries by (4). The nonequilibrium distribution on boundaries is assumed to be of the same value as that in the neighboring inner node . and on the inner node are computed by (6) and (4), respectively.
3. GPU Settings
A wellknown inherent advantage of lattice Boltzmann method is its parallelism. Previous study [20] compared the acceleration performance between NavierStokes solver and current LBM solver, in which incompressible flow around a cylinder was simulated on GPU (GeForce GTX280) and CPU (Xeon E5420, 2.5 GHz) separately. For the NavierStokes solver in which RedBlack scheme and multigrids were applied, the GPU acceleration over CPU was 13.7. Comparatively, for LBM on the same grid scale, the acceleration was 87.4. In addition to the acceleration performance of LBM on GPU as mentioned, the difficulty of coding is much smaller for LBM than NavierStokes on GPU.
However, the expense to LBM’s parallelism is also obvious due to substantial variables and consequent huge memory consumption. For the model in single precision calculation, the theoretical GPU memory requirement on 1.5 × 10^{8} grids (1024 × 256 × 600) is over 30 G. Besides, data transfer between GPUs is realized by MPI (message passing interface) and CudaMemcpy() sentences in CUDA. From the previous experience [11], if the number of GPUs is less than 10, the domain partitioning is more efficient than and . In the current research, 6 GPUs of Nvidia Tesla K20M are used and the current partitioning of the GPUs is illustrated in Figure 3.
4. Physical and Numerical Models
The physical model of the jet in cross flow problem is shown in Figure 4. The computational domain is hexahedral in shape. The mainstream flow is in direction, while the jet is orthogonal to the mainstream in direction and ejected from the bottom plane uniformly with diameter . Origin of the coordinates is located at the center of the round jet exit. The flow domain is zerogradient in pressure and noslip boundary conditions are applied on the bottom plane excluding the jet exit. On transverse planes (with normal in direction), periodic boundary conditions are applied. The inlet turbulent profile of velocity is given through a calculation procedure similar to the socalled “rescaling process” as described in [21]. The results presented in this paper are based on flow parameters and boundary conditions listed as follows.(i)Ejecting hole diameter .(ii)Domain length .(iii)Domain span .(iv)Domain height .(v)Freestream turbulence intensity %.(vi)Boundary layer height .(vii)Velocity ratio .
Reynolds number is defined based on jet velocity and film hole diameter as . The largest Reynolds number reaches 3,000 on grids (1024 × 256 × 600), and several calculations are performed with different dimensions, grids, and jet locations trialed as shown in Table 1. The discussion is based on the 1st case in the table. The flow domain is kept in the “lowspeed” range, and velocity magnitude is smaller than 0.3 times of the lattice sound speed . Thus, the LBGK model used in current study is suitable [22].

5. Model Validation
Nondimensional timeaveraged streamwise velocity versus is presented in Figure 5. The location is selected before the jet at and to eliminate the influence from the jet. A turbulent boundary layer profile is obtained, consisting of laminar sublayer and log layer, which is consistent with the Von Karman mixing length theory.
Further comparisons with experiments are made in regions where the mainstream is disturbed by the jet as shown in Figure 6. The experiments were done by Meyer et al. [23, 24] using stereoscopic PIV in a JICF problem. Timeaveraged streamwise and spanwise velocities ( and ) are presented in spanwise direction (with respect to ) at locations of , and , . Results show good agreement with experimental data. In current simulation, a uniformly distributed velocity profile is set at the jet exit, while, in experiment, the jet is supplied through a tube and a turbulent profile has been developed inside it. Thus, an overestimation of the jet’s momentum near its boundary of the jet hole can be expected and seen in Figure 6.
6. Results and Analysis
6.1. Energy Spectra
Fast Fourier transformation (FFT) is performed to a time series of turbulent energy at the sample frequency of 2000 at , , and . The turbulent energy is defined as . The corresponding turbulent power spectrum is presented versus frequency as shown in Figure 7. The calculated turbulence decay rate is close to the theoretical canonical value of −5/3 and no obvious spectrum buildup is observed, which indicate the current mesh resolution is adequate.
6.2. Velocity Field and Vorticity
The timeaveraged velocity () and instantaneous velocity () fields in the middle plane of direction () are plotted in Figures 8(a), 8(b), and 8(c), respectively. The corresponding vorticity distributions based on the timeaveraged and instantaneous velocities are also shown in the figure. In Figure 8(a), the vertical jet is bended toward exit by the mainstream flow and phenomenon like flow passing a solid obstacle is observed in the wake region. As shown in the enlarging plot of flow field, the approaching wall boundary layer meets with adverse pressure gradient before the jet and consequently separates and forms the horseshoe vortex. Considering the vorticity field, negative and positive values are observed in leading edge and trailing edge along the jet trajectory and trivial values exist in the wake region. Instantaneous velocity and vorticity fields at time steps 80000 and 120000 are shown in Figures 8(b) and 8(c), respectively. Generation and shedding of KelvinHelmholtz vortices are introduced by the shear layers between the mainstream and the jet on both leading edge and trailing edge. The shed vortices are mixing with the mainstream and dissipated in the downstream region quickly. In the wake region of the jet, wake vortices are observed through those nontrivial vorticity values; however, no vortex tube is directly observed in this cross section view. To compare the timeaveraged fields with their instantaneous counterparts, the KelvinHelmholtz vortices are not available after averaging, which proves the shearlayer induced vortices are inherently unsteady and periodic.
(a) Timeaveraged
(b) Time step = 80000
(c) Time step = 120000
The instantaneous velocity () and timeaveraged velocity () fields are displayed at locations , , and , respectively, in Figure 9. The corresponding vorticity distributions are also displayed in the figure. In Figure 9(a), the instantaneous velocity vectors and vorticity distributions at time step 80000 are shown. Separation of mainstream occurs near the midchord portion. Further, vortex shedding and dissipation are observed in the downstream area. Wake region is formed at the backside of the jet. A characteristic flow pattern of the wake by the transverse jet is obtained which is significantly different from that by a solid cylinder as described in [1]. In Figures 9(b) and 9(c), the timeaveraged velocity vectors and vorticity distributions are displayed. Similar to flow passing a solid cylinder, along the streamwise direction, positive and negative values appear when the mainstream starts to separate. In downstream region, the vorticity recovers to trivial values due to the dissipation of those periodicallyshed vortices.
(a)
(b)
(c)
The timeaveraged velocity () fields are presented in several streamwise planes at locations , , and in Figures 10(a), 10(b), and 10(c), respectively, with vorticity distributions. As shown in Figures 10(a) and 10(b), the CRVP system consists of two layers, the upper one and the lower one. For a round hole, as used in the current study, the rotating directions of the two pairs are the same. This “twodeck” vortices structure is very similar to that observed in [2] by laserinduced fluorescence. By comparing among Figures 10(a), 10(b), and 10(c), it is found that when moving downstream, the vertical component () of the velocity reduces because of the jet’s bending. At the same time, the lower layer vortices lift in vertical direction and weaken in strength. Finally, in far downstream region (), the component is at the same level of component since the jet is almost fully bended and those two layers fully emerge with strength further weakened.
(a)
(b)
(c)
6.3. Turbulent Statistics
Reynolds stress distributions of , , and are plotted in the midplane of direction () in Figures 11(a), 11(b), and 11(c), respectively. The shear stress component of the Reynolds stress tensor is shown in Figure 11(a). Strong anisotropy of the flow field can be observed in the flow field. Negative values of appear in the leading edge of the jet with positive values in the trailing portion. In the wake region, both positive and negative values exist with big nonuniformity. In Figures 11(b) and 11(c), the normal Reynolds stress is displayed. For both and , the highest values appear in the leading and trailing portions of the jet due to large velocity gradients. Right behind the jet in the recirculation zone, both components maintain some nontrivial values, especially , indicating the wrapping motion around the jet by the fluid flow.
(a)
(b)
(c)
6.4. Coherent Structure
The isosurfaces of second invariant of velocity gradient at time steps 80,000 and 120,000 with domain height of are plotted in Figure 12. The definition of is given by (11) in tensor form:where and denote the rate of rotation and rate of shearing, respectively. The hairpinshaped coherent structures are originated from the side and trailing edge of the ejecting hole and wrapping around the jet. In the downstream region, the hairpin structure is weakened and dissipated into the mainstream.
(a) Time step = 80000
(b) Time step = 120000
7. Conclusion
A direct numerical simulation of jet in crossflow based on lattice Boltzmann method is performed on multiGPUs. The current simulation takes about 10 hours based on largest grid of 1.5 × 10^{8}. Several points can be summarized and concluded as follows.(1)A turbulent boundary layer is observed before the jet (, ), consisting of laminar sublayer and turbulent core. After the jet at , and , , timeaveraged streamwise and spanwise velocity components ( and ) are compared with experiments, and good agreement is obtained.(2)The turbulent energy spectrum is plotted versus frequency at , , and . A decay rate close to the theoretical value of −5/3 is observed.(3) velocity vectors and vorticity fields are plotted in , , and direction planes. Unsteady shearlayer vortices are formed and shed along the leading and trailing side of the jet trajectory. Horseshoe vortex is generated in the plate boundary layer before the jet. Twodeck structure of the CRVP is retained and the lower pair is observed to rise along the streamwise direction.(4)The normal components ( and ) and shear component () of the Reynolds stress are revealed in the plane. Strong anisotropy is observed for the flow field due to the shear layer and wake.(5)Coherent structure represented by the second invariant of velocity gradient () is plotted at different time steps. The characteristic hairpin vortices are observed.
Conflict of Interests
The authors declare that there is no any conflict of interests regarding the publication of this paper.
Acknowledgments
The work is supported by 973 Project of China (2013CB035702), the National Natural Science Foundation of China (11302165 and 11402191), and Postdoctoral Science Foundation of China (2013M540741).
References
 T. F. Fric and A. Roshko, “Vortical structure in the wake of a transverse jet,” Journal of Fluid Mechanics, vol. 279, pp. 1–47, 1994. View at: Publisher Site  Google Scholar
 B. A. Haven and M. Kurosaka, “Kidney and antikidney vortices in crossflow jets,” Journal of Fluid Mechanics, vol. 352, pp. 27–64, 1997. View at: Publisher Site  Google Scholar
 S. Y. Chen and G. D. Doolen, “Lattice Boltzmann method for fluid flows,” Annual Review of Fluid Mechanics, vol. 30, pp. 329–364, 1998. View at: Publisher Site  Google Scholar
 Y. L. He, Y. Wang, and Q. Li, Lattice Boltzmann Method: Theory and Applications, Science Press, Beijing, China, 2009.
 L. Chen, Q. Kang, Y. Mu, Y.L. He, and W.Q. Tao, “A critical review of the pseudopotential multiphase lattice Boltzmann model: methods and applications,” International Journal of Heat and Mass Transfer, vol. 76, pp. 210–236, 2014. View at: Publisher Site  Google Scholar
 C. K. Aidun and J. R. Clausen, “LatticeBoltzmann method for complex flows,” Annual Review of Fluid Mechanics, vol. 42, pp. 439–472, 2010. View at: Publisher Site  Google Scholar
 nVIDIA, “NVIDIA CUDA Compute Unified Device Architecture Programming Guide Version 2.0,” nVIDIA, 2008. View at: Google Scholar
 I. Buck, T. Foley, and D. Horn, “Brook for GPUs: stream computing on graphics hardware,” ACM Transactions on Graphics, vol. 23, pp. 777–786, 2004. View at: Google Scholar
 J. Krüger and R. Westermann, “Linear algebra operators for GPU implementation of numerical algorithms,” ACM Transactions on Graphics, vol. 22, no. 3, pp. 908–916, 2003. View at: Publisher Site  Google Scholar
 S. Ogawa and T. Aoki, “GPU Computing for 2dimensional incompressibleflow simulation based on multigrid method,” Transactions of JSCES, Article ID 20090021, 2009. View at: Google Scholar
 W. Xian and A. Takayuki, “MultiGPU performance of incompressible flow computation by lattice Boltzmann method on GPU cluster,” Parallel Computing, vol. 37, no. 9, pp. 521–535, 2011. View at: Publisher Site  Google Scholar  MathSciNet
 D. Rossinelli, M. Bergdorf, G.H. Cottet, and P. Koumoutsakos, “GPU accelerated simulations of bluff body flows using vortex particle methods,” Journal of Computational Physics, vol. 229, no. 9, pp. 3316–3333, 2010. View at: Publisher Site  Google Scholar  MathSciNet
 T. Shimokawabe, T. Aoki, J. Ishida, K. Kawano, and C. Muroi, “145 TFlops performance on 3990 GPUs of TSUBAME 2.0 supercomputer for an operational weather prediction,” in Proceedings of the 11th International Conference on Computational Science (ICCS '11), vol. 4, pp. 1535–1544, June 2011. View at: Publisher Site  Google Scholar
 X. Wang and T. Aoki, “High performance computation by multinode GPU clusterTSUBAME 2.0 on the air flow in an urban city using lattice Boltzmann method,” International Journal of Aerospace and Lightweight Structures, vol. 2, no. 1, pp. 77–86, 2012. View at: Publisher Site  Google Scholar
 T. Miki, X. Wang, T. Aoki et al., “Patientspecific modelling of pulmonary airflow using GPU cluster for the application in medical practice,” Computer Methods in Biomechanics and Biomedical Engineering, vol. 15, no. 7, pp. 771–778, 2012. View at: Publisher Site  Google Scholar
 X. Wang, Y. Shangguan, N. Onodera, H. Kobayashi, and T. Aoki, “Direct numerical simulation and large eddy simulation on a turbulent wallbounded flow using lattice boltzmann method and multiple GPUs,” Mathematical Problems in Engineering, vol. 2014, Article ID 742432, 10 pages, 2014. View at: Publisher Site  Google Scholar  MathSciNet
 P. L. Bhatnagar, E. P. Gross, and M. Krook, “A model for collision processes in gases,” Physical Review, vol. 94, no. 3, pp. 511–525, 1954. View at: Publisher Site  Google Scholar
 Y. H. Qian, D. D'Humières, and P. Lallemand, “Lattice BGK models for navierstokes equation,” EuroPhysics Letters, vol. 17, no. 6, pp. 479–484, 1992. View at: Publisher Site  Google Scholar
 Z.L. Guo, C.G. Zheng, and B.C. Shi, “Nonequilibrium extrapolation method for velocity and pressure boundary conditions in the lattice boltzmann method,” Chinese Physics, vol. 11, no. 4, pp. 366–374, 2002. View at: Publisher Site  Google Scholar
 D. Xu, G. Chen, X. Wang, and Y. Li, “Direct numerical simulation of the wallbounded turbulent flow by Lattice Boltzmann method based on multiGPUs,” Applied Mathematics and Mechanics, vol. 34, no. 9, pp. 1–9, 2013. View at: Google Scholar
 F. Muldoon and S. Acharya, “Direct numerical simulation of pulsed jetsincrossflow,” Computers & Fluids, vol. 39, no. 10, pp. 1745–1773, 2010. View at: Publisher Site  Google Scholar
 Z. L. Guo and C. G. Zheng, Theory and Applications of Lattice Boltzmann Method, Science Press, Beijing, China, 2010.
 K. E. Meyer, O. Özcan, P. S. Larsen, and C. H. Westergaard, “Steroscopic PIV measurements in a jet in crossflow,” in Proceedings of the 2nd International Symposium on Turbulence and Shear Flow Phenomena, Stockholm, Sweden, June 2001. View at: Google Scholar
 K. E. Meyer, O. Özcan, P. S. Larsen, and C. H. Westergaard, “Flow mapping of a jet in crossflow with stereoscopic PIV,” Journal of Visualization, vol. 5, no. 3, pp. 225–231, 2002. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2015 Jiang Lei et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.