Channel simulators are powerful tools that permit performance tests of the individual parts of a wireless communication system. This is relevant when new communication algorithms are tested, because it allows us to determine if they fulfill the communications standard requirements. One of these tests consists of evaluating the system performance when a communication channel is considered. In this sense, it is possible to model the channel as an FIR filter with time-varying random coefficients. If the number of coefficients is increased, then a better approach to real scenarios can be achieved; however, in that case, the computational complexity is increased. In order to address this issue, a design methodology for computing the time-varying coefficients of the fading channel simulators using consumer-designed graphic processing units (GPUs) is proposed. With the use of GPUs and the proposed methodology, it is possible for nonspecialized users in parallel computing to accelerate their simulation developments when compared to conventional software. Implementation results show that the proposed approach allows the easy generation of communication channels while reducing the processing time. Finally, GPU-based implementation takes precedence when compared with the CPU-based implementation, due to the scattered nature of the channel.

1. Introduction

Currently, the high demand for integrated services (voice, data, and video) means that new data transmission schemes have to be developed for dealing with high transmission data rates and at the same time for offering high levels of quality of service. The fourth generation (4G) of mobile communication systems is still under development; its main goal is to provide a digital communication network (land, mobile, and satellite) with peak data rates of 100 Mbps for high mobility devices and high data rates of 1 Gbps for users or devices in low mobility environments or stationary conditions. The main technologies used in 4G include techniques based on multiple-input and multiple-output (MIMO) antennas, turbo decoding, adaptive modulation, coding schemes and error correction, and orthogonal FDMA (orthogonal FDMA, OFDM) [1, 2]. Current versions of standards that incorporate 4G are LTE-A (long term evolution-advanced) and IEEE 802.16 m WiMAX (Worldwide Interoperability for Microwave Access) mobile. Therefore, the new issues imposed by the standards require new processing algorithms to be tested on high mobility environments affected by Doppler shifts (time-selective channels) and multipath propagation (frequency-selective channels). The temporal channel variability occurs when the characteristics of the transmission medium change over time or when there is a relative motion between the receiver and transmitter, as in communication systems such as LTE and WiMAX. The frequency selectivity appears when multiple copies of the transmitted signal arrive at the receiver due to physical mechanisms such as multipath propagation.

Moreover, knowing the behavior or performance of a mobile communication system under real conditions (in situ test) can be very expensive, owing to the transfer of the communications system and test equipment to the place under study, among other issues. Additionally, the system behavior can not be tested under the same propagation conditions due to the nature of the communication channel. Faced with this problem, an economical alternative is to use mathematical models, which represent the radio channels under consideration. In this sense, we can define a channel simulator as a software tool that permits reproduction of the behavior or the propagation conditions of a mobile communications channel under controlled or laboratory conditions.

On the other hand, GPU-accelerated computing is the use of a graphics processing unit (GPU) together with a CPU in order to accelerate scientific, engineering, and business applications [3]. Recently, several works related to the wireless communication area, which uses GPU devices, have been published [47]. Those works follow an implementation strategy in order to handle the channel complexity using multiple cores. For example, in [4] a wireless channel simulator is implemented. In that work, the potential of GPU-based processing is studied in order to improve the runtime performance of computationally intensive accurate wireless network simulation. In [5], the use of general purpose GPUs is investigated in order to provide the computational capabilities required for performing the radio frequency path loss computation. A discussion of the acceleration of wireless channel simulation using GPUs is provided in [6]. In addition, in [7], an implementation of parallel lattice reduction-aided 2 × 2 MIMO detector using GPUs for the WiMAX standard is presented.

Although several works related to the use of GPUs in communication systems exist, there are currently no works that describe in detail the implementation of a fading channel simulator based on GPUs. In this paper, the methodology for implementing a fading channel simulator (time and frequency selective) via GPU computing is presented.

The proposed methodology considers the use of common GPU software libraries that permit nonspecialized users in GPU programming to easily implement the proposed simulator. On the other hand, the generation of the Rayleigh fading variates is achieved using the filtering method [810]. In this case, the filtering method is carried out in time domain by using a finite impulse response (FIR) filter for coloring Gaussian noise samples. Furthermore, it is well known that if the filter order is increased, then the accuracy of the channel statistics can be improved, though at the cost of increasing the computational complexity. Therefore, in this work, we take advantage of GPUs for handling such computational complexity (multiplication and addition operations) in order to implement an accurate communication channel for SISO systems. Moreover, this methodology paves the way for implementing MIMO channel simulators in the future.

The rest of this paper is organized as follows: In the second section, the background of the wireless communication system is stated, specifically as regards the channel communication model. In Section 3, how to simulate the communication channel is explained. Next, in Section 4, the GPU implementation of the fading channel simulator is detailed. Section 5 is devoted to presenting the implementation results when a WiMAX scenario is considered. Finally, the conclusions are presented in Section 6.

2. Communication System

Consider a single-input and single-output (SISO) communication system where the transmission of in-phase and quadrature signals modulated by orthogonal carriers and , respectively, are assumed, which are mixed for obtaining . This signal is propagated through the communication channel , which is considered to be a causal time-varying linear system. The signal filtered by the channel reaches the receiver where a noisy version is detected. It can be expressed mathematically as follows:where , and is a time variable. The impulse response states the response of the channel in the instant when a stimulus is applied in , which reflects the time variability of the channel impulse response. Likewise, is the aggregated stochastic noise. This received signal is demodulated in order to obtain the in-phase and quadrature signals and .

For sake of simplicity, if and , where is any carrier frequency and is any phase, the system becomes the well known single carrier communication system. It is important to emphasize that an OFDM system implemented with IFFT/FFT produces a base-band signal that is modulated as in a single carrier system.

If we consider that both signals and are band limited to a maximum frequency of and (this condition is always accomplished in real communication systems) it is easy to demonstrate [11, 12] with the aid of the Hilbert transform the existence of base-band equivalent signals , , , and for , , , and , respectively. In general, these equivalent base-band signals are complex, where the real part corresponds to the in-phase component and the imaginary to the quadrature component; thus, and for . The relations between the original pass-band signals and their baseband equivalents are as follows [12]:where is the real part of the complex number in parentheses. Considering (2), the base-band equivalent of (1) iswhich can be interpreted as a collection of multiple paths (scatters), where the transmitted signal is propagated. The fact that these paths have different lengths and pass through different conditions of propagation causes the received signal from a specific path to be a delayed, attenuated, and phase-shifted version of the . In this sense, for a specific time and a specific delay , the channel coefficient will be a complex variable, where the magnitude represents the attenuation factor and the phase shift factor. On the other hand, due to the constant changes in the environment and the possible relative movement between transmitter and receptor, these factors are time dependent. According to [12], can be modeled as a complex stochastic process composed of the sum of a deterministic part (the ensemble average of ) and a random part (zero mean random process). From this point, we will only consider the random part (an assumption generally accepted when a channel simulator is developed). The autocorrelation function of this random process is equal towhere is the expectation operator and represents the complex conjugate. This channel model is difficult to implement; nevertheless, some assumptions can be asserted which simplify the model. The first is the absence of correlation between the different scatters, and the second is that each scatter is a wide-sense stationary process, which together comprise the well known wide-sense stationary uncorrelated scattering (WSSUS) model. Therefore, (4) transforms intowhere , , and is the autocorrelation function with respect to the time difference variable for the scatter located in the delay variable . From (5), it is possible to calculate the scattering function, which is defined as the Fourier transform of the correlation function with respect to the time difference variable , as follows:where is the Fourier transform operator. This scattering function indicates how the Doppler spectrum is for a given delay value in the variable .

In many communication standards, a discrete number of scatters are considered instead of a continuous number, as suggested in previous equations. If this assumption is considered, thenwhere is an index variable that enumerates the discrete scatters and is a complex variable that encloses the gain and phase shift factor of such scatter. If a WSSUS channel is considered, the correlation function of (7) iswith scattering function

3. Channel Simulation

In order to perform a computational simulation of the communication channel, it is necessary to deal with the discrete version of the baseband equivalent channel presented in (7). This discrete channel results in band-limiting and sampling (7) in time and time-delay domains at a rate of . Thus, it is defined aswhere , , the symbol represents the convolution operator, and is a function for band-limiting the channel to , which, for practical purposes, could be a time windowed cardinal sine function. Substituting (7) into (10) results inwhere corresponds to the coefficients of the FIR filter for simulating the communication channel, enumerates the samples in the time domain, and enumerates the taps of the filter. Likewise, can be calculated as , where is the maximum delay of the paths in the channel , and is the length of the filter . This filter could be anticausal; nevertheless, it is possible to introduce a delay in order to convert this filter into a causal filter and therefore physically feasible.

In order to implement (11), it is necessary to generate uncorrelated discrete Gaussian stochastic complex processes at rate . In the state of the art many algorithms for obtaining these stochastic processes are stated, as mentioned in [1316] and references therein. Such processes must be filtered (colored) in order to accomplish the desired scattering function. It is important to note that these filters only affect the frequency components below a maximum Doppler frequency ; therefore, it is possible to generate the samples at a rate of at least , where typically , and then to use any upsampling technique for accomplishing the rate.

The impulse response of the filter for coloring the th process is the discrete version (at rate ) of the following expression:

Finally, an interpolation technique such as splines, polynomial, or basis expansion is used for obtaining the samples at rate. The entire process is presented in Figure 1 and summarized in Algorithm 1.

Require: Scattering function
Require: Define the gain that correspond to the variance of the process for all the paths
(1) for all   such that   do
(2)  Generate the zero mean unitary variance complex Gaussian stochastic process at rate samples per second
(3)  Multiply the stochastic process by for ensuring the gains of the paths
(4)  Filter the process with discrete
(5)  Interpolate the process for obtaining samples at rate
(6) end for
(7) for all    do
(8)  Obtain filter’s coefficients
(9) end for

4. GPU Implementation

The emergence of GPUs has allowed complex algorithms to be executed almost in real time. GPU is conceptualized as a set of streaming multiproccesors (SM), where each SM is characterized by a single instruction multiple data (SIMD) architecture. Therefore, in each clock cycle, each processor of the multiprocessor executes the same instruction, operating on multiple data streams; that is, each of these processors has the possibility of accessing a shared memory (common to all processors belonging to the same SM) and a local cache memory. In addition, all the processors have access to the global GPU (device) memory. Figure 2 illustrates the GPU hardware architecture.

Our strategy for implementing the fading channel simulator is aimed at improving the overall performance by chaining software functions (called kernels) representing each communication step. In order to implement the parallel fading simulator as illustrated in Figure 3, we distinguish five stages in the GPU design methodology as follows.

4.1. Gaussian Random Number Generator

In this stage, the CUDA Random Number Generation (cuRand) library [17] is employed in order to obtain Gaussian random numbers (GRN) by means of efficient generation of high-quality pseudorandom numbers. Particularly,  curand_init  function is launched for creating a random number generator in a massively parallel scheme. There are seven types of random number generators in  cuRand; in this study, we have selected the XORWOW algorithm, which is a member of the Xor_shift family of pseudorandom number generators, with customized parameters for operating on GPUs.

The  curand_normal2  function generates two normally distributed pseudorandom numbers in each call. Because the underlying algorithm is based on the Box-Muller transform, it is suitable for generating random complex numbers; that is, each call generates real and imaginary parts at the same time.

There is a CUDA kernel for computing a set of independent GRN vectors. Each vector corresponds to a path, which is computed in chunks by the GPU multiprocessors and then stored on device global memory. The implementation of the GNR generator is presented in the Algorithm 2, where the function  setup_kernel  initializes the threads of the same block with a different sequence number but the same seed and offset (zero offset). Furthermore,  generate_normal_kernel  computes several pseudorandom values with Gaussian distribution through the calling of  curand_normal2.

_global_ void setup_kernel(curandState  state)
{ int id = threadIdx.x + blockIdx.x  6;
   curand_init(1234blockIdx.x, id, 0,  &state[id]);
_global_ void generate_normal_kernel(curandState  state, int n, float  result)
{ int id = threadIdx.x + blockIdx.x  6;
   float2 x;
   curandState localState = state[id];
   for(int i=0; i<n; i++)
   { /  Generate pseudorandom normals  /
      result[id]= x.x;
/  Copy state back to global memory  /
   state[id] = localState;

4.2. Parallel Doppler FIR-Filter

The Doppler filter uses the resulting coefficients obtained by sampling (12) and the random numbers generated in the previous subsection. Since the filter coefficients are fixed for all channel realizations and paths, they are stored in the constant memory of GPU. This memory is devoted to storing and broadcasting read-only data to all threads on the GPU. In addition, the results of GRN are stored in shared memory, since many threads must access them simultaneously. The filtering is conceptualized as a convolution, so a kernel that performs the convolution in parallel is used.

There is a set of independent 1D signal convolutions to be computed, one for each path. However, the filtering is performed using the NVIDIA Performance Primitives library (npp) [18]; specifically, one of the  nppiFilterRow  functions is used, which performs a 1D filtering on 2D data, each row being a channel path.

4.3. Path Gain Implementation

The path gain is implemented with a multiplication function. The resulting colored noise from the previous stage is multiplied by a scalar. This could be carried out with a specific kernel or by using a standard library, such as CUDA Basic Linear Algebra Subroutines (cuBLAS) [19] or  npp. The proposed implementation uses the  nppiMulC  function of the  npp  library.

4.4. Upsampler

The upsampler stage is responsible for generating noise samples at the rate , implemented as an interpolation. The usual interpolation available for GPUs is the linear interpolation offered by texture memory;  npp  offers other methods for more accurate results. In this case, the  nppiResize  function with a cubic interpolation is used. It returns the interpolated value for a given coordinate within two known noise values.

4.5. Tap Generator

Multiple paths have been treated separately. In this stage, they are correlated using predefined (computed offline) coefficients according to (11). This correlation operation can be seen as the multiplication of upsampled scaled colored noise (path) by the coefficient matrix . This could be carried out with a programmer’s own implementation or by using a standard library, such as  cuBLAS  as well. This proposal uses the  cublasSgemm  kernel that performs a matrix-matrix multiplication with optional scalar product.

5. Implementation Results

In order to corroborate the functionality of the proposed fading channel simulator in modern communication systems such as WiMAX, it was configured with the following parameters [20, page 404]: a maximum frequency Doppler Hz and a sample rate  Msps, . In addition, the vehicular class B ITU multipath channel model was considered, which consists of six discrete paths with relative power  dB at delay time  nsec, respectively. For implementing the filter , a raised cosine function with a roll-off factor of and a duration of  sec was considered. This delay results in the generation of taps. In Figure 4, a resulting GPU-based realization of the fading channel according to the specified parameters for time samples is presented. It is important to note that the offline computed data (see Figure 3) are transferred to GPU simulator by text files.

The simulation was carried out using an iMAC computer with the following specifications: OS 10.9.4 (Maverics), Intel Core processor i5 (3.4 GHz), 16 GB of RAM, graphic card GeForce GTX 780 M with 4 GB of RAM, and 1536 CUDA cores.

For evaluating the time performance, the parameters used in the previous test have been maintained; however, the parameter was fixed to samples. In this sense, Table 1 presents the average, maximum, and minimum time consumption for a CPU-based implementation (Matlab) versus the proposed GPU-based methodology (CUDA). It is clear that the GPU methodology has gains of -fold (mean value) when compared with CPU-based implementations, which is attractive if parallel versions of the channel simulator are required, as could be the case in MIMO applications.

Table 2 reports the time percentage for accomplishing each task of the channel simulator in the GPU. It should be noted that in this table the reading and device memory allocation—the most time-consuming tasks—are not considered. These tasks are performed only once at the initialization stage of the simulation.

On the other hand, Table 3 and Figure 5 present the overall time consumption in milliseconds for CPU- and GPU-based implementations when the number of samples is fixed to = 5120, 10240, 20480, 81920, 327680, 655360, 1000000, and samples. This shows that while the time consumption in the CPU-based implementation increments exponentially, it remains almost linear in the GPU-based implementation.

Similarly, the good performance achieved with the GPU implementation with respect to the CPU implementation can be observed in the x-fold gain reported in Table 3. This gain is calculated as the time consumption quotient of both implementations. The behavior of this gain has been reported for each of samples stated in the previous paragraph.

Finally, it is important to emphasize that the presented approach can deal with several path realizations. This suggests that the developed fading channel simulator can be considered for generating large MIMO channels, which represents a new simulation paradigm.

6. Conclusions

The principal result of this study is the introduction of a methodology for designing fading channel simulators via GPU devices. Such a methodology permits nonspecialized users to easily implement channel simulators in parallel. As was shown, the use of GPUs in the development of fading channel simulators greatly saves simulation time when channel realizations are generated for testing communication systems. Moreover, a case of study for WiMAX systems demonstrated the functionality of the implemented channel simulator. We believe that the proposed parallel channel simulator can aid in testing mobile communication systems based on LTE and WiMAX. Additionally, the presented approach based on GPU will allow the design of more sophisticated simulators of complex channel models such as triply selective MIMO fading channels (i.e., time, frequency, and space selective).

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.


This work was supported by the Programa para el Desarrollo Profesional Docente (PRODEP) 2014 and CONACYT, Ciencia Básica, 2014 (CB2014-241272), Mexico.