Abstract

Providing global coverage for ubiquitous users is a key requirement of the fifth generation (5G) and beyond wireless technologies. This can be achieved by integrating airborne networks, such as unmanned aerial vehicles (UAVs) and satellite networks, with terrestrial networks. However, the deployment of airborne networks in a three-dimensional (3D) or volumetric space requires a new understanding of the propagation channel and its losses in both the areal and altitude dimensions. Despite significant research on radio environment map (REM) construction, much of it has been limited to two-dimensional (2D) contexts. This neglects the altitude-related characteristics of electromagnetic wave propagation and confines REMs to 2D formats, which limits the comprehensive and continuous visualization of propagation environment variation in spatial dimensions. This paper proposes a volumetric REM (VREM) construction approach to compute 3D propagation losses. The proposed approach addresses the limitations of existing approaches by learning the spatial correlation of wireless propagation channel characteristics and visualizing REM in areal and height/altitude dimensions using deep learning models. Specifically, the approach uses two deep learning-based models: volume-to-volume (Vol2Vol) VREM with 3D-generative adversarial networks and sliced VREM with altitude-aware spider-UNets. In both cases, knowledge of the propagation environment and transmitter locations in 3D space is used to capture the spatial and altitude dependency of the propagation channel’s characteristics. We developed the Addis dataset, a large REM dataset comprising 54,000 samples collected from the urban part of Addis Ababa, Ethiopia, to train the proposed models. Each sample of data comprises a 512-meter by 512-meter areal resolution with different 3D obstacles (buildings and terrain), 15 simulated propagation loss maps at every 3-meter altitude resolution, and 80 different 3D transmitter locations. The results of the training and testing of the proposed models reveal that the constructed VREMs are statistically comparable. In particular, the Vol2Vol approach has a minimum L1 loss of 0.01, which further decreases to 0.0084 as the line-of-sight (LoS) probability increases to 0.95.

1. Introduction

Fifth-generation (5G) wireless networks and beyond are being designed to provide global, ubiquitous connectivity while meeting demanding service requirements such as high spectral efficiency, throughput, energy efficiency, reliability, and guaranteed low end-to-end latency. Key enabling technologies for 5G to meet the service requirements include network densification, massive multiple-input multiple-output (MIMO) antenna systems, and millimeter wave (mmWave) communications. In addition, the integration of terrestrial and nonterrestrial networks, such as satellite and unmanned aerial vehicle (UAV) networks is being researched to extend coverage over a wider area, increase capacity, improve reliability, and enable new applications [1].

UAVs, in particular, are considered to play a critical role in future wireless network due to their large mobility, adaptive altitude, and line-of-sight (LoS) communication capabilities [2, 3]. Figure 1 depicts an example of a terrestrial cellular network that leverages UAVs to improve its service delivery. The UAV can function as aerial user equipment (AUE) or an aerial base station (ABS), both located at an altitude of tens to hundreds of meters. AUEs and ABSs are used to offload traffic in cellular hotspots [4, 5], for remote sensing, remote monitoring and control, surveillance and security, emergency communications, and entertainment, to name a few. The cellular network, on the other hand, is represented by a ground-based base station (GBS) that serves both ground user equipment (GUEs) and aerial user equipment (AUEs) positioned at varying elevations based on the user’s location.

Despite the advantages, the deployment of UAV-assisted networks poses a number of challenges. Primarily, the network planning must take into account the additional degrees of freedom introduced by the different altitudes of the radio elements, such as GUEs, AUEs, and ABSs. This includes the definition of coverage and capacity requirements, as well as the optimization of the UAV trajectory and power consumption. Secondly, the intricate three-dimensional (3D) mobility of ABSs and AUEs introduces a dynamic aspect to LoS communication. This continuous transition from LoS to nonline-of-sight (NLoS) scenarios can lead to fluctuations in signal quality, potentially compromising the cellular network’s ability to maintain consistent and reliable communication channels and services [2, 6]. Furthermore, existing terrestrial cellular networks are inherently designed to accommodate GUEs and GBSs operating within a two-dimensional (2D) plane. This design approach may limit the networks’ effectiveness in serving ABSs and AUEs, which are operating at varying altitudes and across complex 3D terrains. Consequently, signal strength, quality, and reliability may face challenges, necessitating tailored strategies to address these altitude-driven variations.

In a broader perspective, the planning and deployment of both terrestrial and aerial network components within UAV-assisted networks introduces complexities. The inherently 3D nature of the propagation environment, which is characterized by irregular terrain shapes and random receiver locations, requires careful consideration. Overcoming these challenges helps with optimizing UAV placement strategies and resource allocations to ensure seamless communication and coverage across the diverse spatial dimensions introduced by UAVs [7].

1.1. Radio Environment Maps for UAV-Assisted Cellular Networks

In the context of UAV-assisted cellular networks, 3D radio propagation models play a critical role in accurately characterizing the signal propagation environment. These models are predictive tools that mathematically simulate how radio waves propagate in various complex 3D environments, estimating parameters like path loss and signal strength. They are particularly relevant given the unique challenges posed by UAV network elements, such as their 3D mobility and the presence of obstacles in the environment [810]. Existing models encompass a wide range of methodologies, from empirical models that are based on measurement to more sophisticated analytical models such as ray-tracing-based models that account for intricate environmental details. Ray-tracing models excel at accurately simulating signal propagation in a 3D environment by tracing individual ray paths, particularly when utilizing real terrain data and detailed geometry. However, their accuracy comes with computational complexity due to extensive calculations. Not to mention, they are inherently site-specific, meaning they must be re-generated for each new environment and deployment scenario [10, 11].

In that aspect, Radio Environment Maps (REMs) have recently been explored in the context of UAV-assisted cellular networks to address different challenges, such as placement and path-planning optimization and user association [1216]. REMs are essentially a type of fine-grained digital radio map that provides detailed information about signal strength, quality, coverage, and propagation characteristics across a specific geographical region [17]. The strength of REMs lies in their capacity to incorporate real-world data, such as building layouts, terrain features, obstacles, and network configurations, into their visual depiction of how signal quality varies across space. This information can be used to optimize the placement, operation, and performance of UAVs in cellular networks. For example, REMs can be used to optimize the placement of ABSs to maximize coverage and capacity, plan the optimal trajectory for ABSs to minimize interference, optimize antenna configurations, and allocate radio resources to users.

The construction of REMs involves collecting data through measurements, simulations, or a combination of both. Creating REMs based on measurement data such as environment-specific real terrain maps or signal characteristics parameters such as received signal strength (RSS) offers high accuracy as it captures real conditions and potential signal variations. However, it can be resource-intensive, demanding both time and cost, making it suitable for specific applications where precision is paramount [1820]. On the other hand, REMs solely based on data from simulation employ computational models to predict radio wave behaviors across the area of interest. Although less accurate compared to measurement-based approaches due to the simplified nature of the models and the variability of the environment, simulations are efficient and allow the assessment of a broader spectrum of scenarios. As mostly preferred, combining the two approaches strikes a balance between accuracy and efficiency as it reduces the need for extensive measurement collection [21].

1.2. Deep Learning for Constructing REM

Various approaches exist to construct REMs, which can be broadly categorized into two main methods: model-driven methods and data-driven learning methods. Model-driven methods, such as Inverse Distance Weighting (IDW) [22] Kriging [19, 23], and Bayesian models [2427], rely on predefined mathematical models and parameters to describe the relationship between environmental features and radio wave behavior. These models are usually based on simplified assumptions and prior knowledge of the propagation environment. While model-driven methods can provide valuable insights, they may struggle to accurately capture complex interactions, especially in scenarios with numerous variables or dimensions.

On the other hand, data-driven methods, especially those that are based on machine learning algorithms, do not rely on explicitly defined model parameters. Instead, they learn the nonlinear spatial characteristics of the radio environment, along with relevant propagation effects like blockage, reflection, and scattering, directly from the provided data [28]. Such methods usually involve model training using limited historical measurement data, allowing them to generalize and make predictions (interpolate or extrapolate) for unmeasured locations.

Particularly in deep learning, convolutional neural network (CNN)-based models have shown remarkable capabilities in constructing REMs [26, 27, 2935]. CNNs are a type of deep learning algorithm that is well suited for image processing tasks. They can be used to extract features from images, such as edges, shapes, and textures. By leveraging the power of CNNs, the REM construction problem can be redefined as an image processing or feature extraction problem. This simplifies the process of training the propagation loss predictor and building REMs for all GBSs, GUEs, AUEs, and ABSs locations.

Constructing a complete REM with measurement-based data is bound to give a more accurate estimation. However, there are practical limitations to this approach, such as determining the optimal number of measurements and the acquisition of the measurements (either with dedicated sensor networks or crowd-sourcing). To compensate for these limitations, more advanced neural network architectures, such as generative adversarial networks (GANs) in [30] and deep CNN-based autoencoder architectures called UNets in [32, 33, 35], are proposed. These architectures can learn to generate synthetic measurements that are similar to real ones, even when the number of real measurements is limited. This capability can help improve the accuracy of REM construction, particularly in cases where collecting a large number of measurements is not possible. However, the aforementioned papers solely considered a 2D environmental map displaying the location of obstructions (building topology) and the transmitter (along with sparse measured samples in the case of RadioUNet in [35]) to estimate the propagation loss at any point within a specified area.

In the context of UAV-assisted networks [19, 29], address the additional degree of freedom introduced by altitude for creating 2D-REMs. In [19], measurements from UAVs across frequencies and altitudes, coupled with a Kriging model, were used to formulate discrete 2D-REMs at varying altitudes. On the other hand, in [29], the effectiveness of 3D environmental maps in capturing complex radio wave interactions influenced by altitude was highlighted.

Despite substantial research on REM construction, much has been confined to 2D contexts, disregarding altitude-related characteristics of radio wave propagation or confining REMs to 2D formats. This limitation restricts the comprehensive and continuous visualization of propagation environment variations in spatial dimensions, thereby diminishing the applicability of REMs across scenarios, including UAV-assisted cellular networks.

Hence, our study introduces the concept of volumetric REM (VREM), which aims to capture spatial correlation and visualize REM in three dimensions, i.e., 2D areal and height/altitude dimensions. Our work formulates the VREM problem as an image translation challenge, exploring two approaches: volume-to-volume (Vol2Vol) and sliced-map construction (sliced-VREM). In the Vol2Vol method, a deep learning model is trained to generate the VREM representation by utilizing the intricate details of the 3D environmental map and transmitter coordinates. With the sliced-VREM, the complexity associated with learning from volumetric data is addressed by capturing the altitude dependency of propagation characteristics from stacked 2D environmental maps and transmitter location information. To that end, the main contributions of the paper are summarized as follows:(i)Propose two VREM construction approaches, namely, Vol2Vol-VREM and sliced-VREM construction techniques, in which knowledge of the 3D environment, transmitter location information, and simulated REM data are used to train the models to capture spatial correlation and altitude dependency of the characteristics of the propagation channel.(ii)As 3D measurement data are not readily available to train the proposed models, a commercial wireless propagation and radio network planning software called WinProp, which is part of the Altair Feko 2021.1 suite, is used to simulate actual radio propagation and generate a large set of REM data. The simulation took into account the 3D environment geometry and Advanced Long-Term Evolution- (LTE-A-) based network configuration parameters, including the location and height of a particular transmitter. We named the data Addis dataset, and it is developed for the urban part of the city of Addis Ababa, Ethiopia. Comprising 54,000 samples, the dataset encompasses 225 distinct geographical regions, each defined by 3D terrain and building information. Ground-level areas measuring 512 meters by 512 meters are considered, and simulated REMs are generated for every 3-meters resolution interval of height. The dataset encompasses three propagation links: the GBS-GUE link, the GBS-AUE link, and the ABS-GUE link, for 80 arbitrary 3D transmitter locations.

To the best of our knowledge, the Addis dataset is the first of its kind, as it provides 3D information about radio wave propagation. The dataset’s comprehensive coverage of diverse propagation environments, spanning 225 different geographical areas, empowers the VREM model to yield reliable results even in unfamiliar settings.

The remainder of the paper is organized as follows: The system model and the problem formulation approach are discussed in Section 2. The two proposed VREM construction approaches are presented in Section 3. The model training, map construction, and performance evaluation of the proposed approaches are discussed in Section 4, and conclusions and future remarks on the work are given in Section 5.

2. System Model

2.1. Scenario Definition and Propagation Loss Model

To address the limitations posed by the lack of available measurement data for training the deep learning VREM construction model, we generated synthetic propagation loss data using a realistic 3D environment map and a simplified ray-tracing propagation loss model known as the dominant path model. This involved setting up two distinct scenarios within a 3D environment, as illustrated in Figure 2. Scenario 1 consists of a GBS located within a designated spatial region to serve GUEs and AUEs at varying altitudes. Similarly, scenario two involves a low-altitude ABS serving GUEs while hovering over a specific geographic region. In both scenarios, a 3D Cartesian coordinate system is used, defining and as the coordinates of the locations GBS/ABS and GUE/AUE within the environment. Here, indicates the height of the GBS antenna above the ground in the first scenario or the hovering altitude of the ABS in the second scenario.

The 3D environment depicted in Figure 2, denoted as , is characterized by environment-dependent coefficients, as outlined in the ITU recommendation report [10]. These coefficients include parameters such as the mean width of the building, , the number of obstacles (mainly buildings), , and the maximum height of obstacles along the LoS link between the transmitter and receiver, . In both scenarios, we focus on the downlink propagation channel, where a signal with power is transmitted (by GBS or ABS) and received by a receiver (AUE or GUE) at a ground distance, and height of , with signal strength .

Propagation channel modeling in UAV-assisted networks involves both air-to-ground and terrestrial channel considerations. Due to the similarity in channel statistics for both cases, the propagation loss is generally defined between any transmitter and receiver, factoring in distance, environment-dependent path loss, and shadowing effects [8]. Depending on whether the propagation path is under LoS or NLoS conditions, the loss between the transmitter at and the receiver at is quantified in decibels (dB) as follows:where and and are the propagation channel coefficients that denote the mean free space loss, the path loss exponent, and the shadowing factor of the propagation, respectively. The Frobenius norm isthe distance between the transmitter and receiver, measured in meters.

For the given operating frequency , and elevation angle , between the transmitter and receiver, the parameter is evaluated aswhich emphasizes its significant influence, especially with increased height differences and at higher frequencies [8, 36]. The specific values of and , on the other hand, depend on environmental factors, such as the distribution of 3D obstacles (e.g., buildings and terrain) that categorize the environment as suburban, urban, dense urban, or high-rise urban. This determination is influenced by factors such as the transmitter-receiver location, geographical area , the ratio of the total area covered in buildings , the mean number of buildings per unit area , and the variation in the heights of buildings that can be modeled as the Rayleigh probability density function [36]. For instance, when considering larger values of and , the scenario often involves an environment abundant with tall buildings, indicative of an urban setting characterized primarily by high-rise structures. Such conditions imply a higher likelihood of NLoS situations, applicable to both GBS or ABS to GUE, as well as GBS to low-altitude AUE communication links. In contrast, lower values of and tend to correspond to suburban areas, where there is a higher chance of near-LOS conditions for GBS-GUE/AUE links. Following that, the parameter generally falls within the range of 2 to 6, which is fitted according to whether the link between the transmitter and receiver is in LoS or NLoS condition. Similarly, the standard deviation of exhibits higher values in high-rise urban regions compared to suburban areas.

To address the complexities of ray tracing, the Urban Dominant Path (UDP) model [37], a simpler yet equally accurate alternative, is employed in this work. As shown in Figure 3, this model concentrates solely on the most prominent propagation path, referred to as the dominant path, connecting the transmitter (GBS or ABS) with the receiver (AUE or UE). It is notable that in a majority of propagation scenarios, approximately 90% of the received power emanates from this single path [37, 38].

The UDP model is designed to collectively represent propagated waves guided by reflections and diffractions at walls and corners, forming the dominant path. It intentionally limits interactions between transmitters and receivers, reducing computational complexity while maintaining accuracy [36, 38, 39]. With this, it is possible to refine the propagation loss expression in equation 2, where the deterministic approximation of is realized through the equation as

Here, represents the number of interactions involving the actual propagation path interacting with reflecting obstacles. It is dependent on the function captures the interaction loss in decibels, accounting for the changes in the direction of propagation represented by the angle due to the wave’s interaction with obstacle. The loss follows a regression function with a positive intercept, gradually increasing as rises until stabilizing for larger values. The waveguiding factor, , represents collective wave propagation (explained in [40]) and varies based on reflection loss, angle, and obstacle-path distance. In densely urban areas with higher and values, the reflection loss for GUE diminishes in turn increasing .

In both scenarios shown in Figure 2, the transmitter (GBS or ABS) maintains a constant altitude above ground level, while the receiver height changes discretely. With increasing altitude above ground, tends to approximate 2, indicating free space propagation. Consequently, the loss becomes more dependent on the separation distance between transmitter and receiver, . Furthermore, decreases for higher altitudes compared to ground-level users, as the number of reflections and diffractions within the area decreases. This means that the propagation loss for the GBS-AUE link is primarily determined by , while the GBS-GUE and ABS-GUE links are more sensitive to environmental effects, depending on and . Notably, a comprehensive examination of gradual altitude changes from 0 to 50 m and their implications on propagation loss across different frequency ranges (sub-6G and at 60 GHz) is expounded upon in [41, 42].

2.2. Volumetric Radio Environment Maps

The mathematical expression of the VREM for a given 3D propagation environment is as follows:

Here, the 3D space is discretized along the X, Y, and Z (or h) axes, each with a unit resolution of , and , respectively. This discretization results in a cubic grid or a pixel that encompasses dimensions N, M, and S. Thus, a distinct receiver position represented as and .

As depicted in Figure 4, the VREM slices obtained at varying altitudes (e.g., ground level at 1.5 m receiver height, 6 m, and 15 m above ground) offer insights into altitude-specific changes in propagation loss and environmental effects. Each slice represents a 2D REM snapshot stacked sequentially to form the 3D VREM, with each altitude slice separated by . As altitude changes, the composition of obstacles, reflective surfaces, and potential signal obstructions can significantly influence propagation characteristics.

2.3. Problem Formulation

In the context of this study, the construction of VREMs is framed as a supervised image-to-image (I2I) translation problem. This problem formulation finds its roots in various fields like image processing, computer graphics, and computer vision [43, 44], where it involves converting input images from a source domain into corresponding images in a target domain . This concept is harnessed to construct VREMs that accurately capture the propagation environment and transmitter location information.

To elaborate, a mapping model, denoted as , is trained with a substantial dataset of training image pairs. This model aims to generate a synthesized target domain image, represented as , which closely resembles the true image corresponding to the input source image . This relationship can be expressed as

Each input image within the dataset, encapsulates the environmental map and transmitter location information, while the corresponding REM captures the propagation characteristics of the signal. This pairing serves as the basis for training our VREM construction models, enabling them to learn the transformation from one image domain to another. Various I2I translation methods are available that attempt to learn the mapping between different image domains [4346].

Notable architectures like Radio UNets (UNets specifically designed for REM construction in [35]) and Spider-UNets (a fusion of UNets and Long Short-Term Memory (LSTM) networks designed for segmented image construction in [46]) utilize generative models such as UNets and GANs networks for image translation. GANs consist of two models, a generator and a discriminator , that work in tandem to generate target images and distinguish them from real ones. UNets, a deep convolutional autoencoder network, was originally designed for image segmentation to facilitate direct feature mapping between input and output images.

In this study, GANs and UNets are employed independently or in combination to address the I2I translation challenge in VREM construction. They serve as foundational frameworks for the proposed approaches, providing a basis for generating VREMs that effectively capture the intricate interplay between the radio environment and propagation phenomena. The subsequent section delves into further detail on these frameworks, elucidating their utilization in the context of this research.

3. Proposed Deep-Learning Approaches for Volumetric Radio Map Construction

Following the world’s trend toward data-driven learning approaches for solving REM estimation problems; this paper introduces two distinct deep-learning-driven strategies for VREM construction. In both methodologies, the environmental map and transmitter location are interpreted as input images, which undergo transformation into corresponding propagation loss maps that vividly depict the 3D environment’s characteristics. Sections 3.1 and 3.2 subsequently elucidate the intricacies of these two deep-learning-based approaches.

3.1. Vol2Vol-VREM Construction with 3D-Generative Adversely Networks

To estimate the characteristics of the volumetric channel directly by learning the 3D inputs or as a form of Vol2Vol translation, we proposed a 3D-GAN-based network that has a concept similar to the Pix2Pix architecture in [45]. However, in contrast to Pix2Pix architecture that uses a conditional GAN network to translate a semantic label to a realistic-looking image, the proposed Vol2Vol-VREM construction approach in Figure 5 implements UNet-based 3D-GAN to learn the one-to-one mapping between the 3D environment and transmitter location maps with corresponding REM. Our selection of the 3D-GAN architecture was motivated by its superior ability to construct high-fidelity 3D images. Compared to traditional interpolation or deep CNNs, 3D-GANs possess a distinct advantage in capturing the intricate spatial relationships and subtle details inherent to 3D environments.

3.1.1. Architecture

The main structure of the 3D-GAN network is illustrated in Figure 5 and the detailed architecture is tabulated in Table 1. The generator, G, part of the network “translates” the environment and transmitter location map input image to the corresponding propagation loss map image and uses a “3D-UNet”-based architecture that relies on skip connections (represented by broken arrows) between each layer of the encoder (red blocks) and decoder (dark blue blocks). With 3D convolution and max-pooling layers, the encoder part can extract features that correspond to the 3D effect of distance, buildings, and other environmental factors between the transmitter-receiver propagation link. A max-pooling follows each paired convolution layer to shirk the input resolution and assist each layer of the UNets to extract the spatially correlated features under the various input resolutions. Two convolution units and a max-pooling layer are referred to as the encoder basic layers (BL-enc) of UNets and are used as a building block for G to construct the four-layered encoder UNet architecture. For the decoder, the key aspect is the use of a transposed 3D convolution layer and double-stride instead of 3D convolution and max-pooling, respectively, to construct the decoder’s basic layers (BL-dec). The double-stride is used as an upsampling unit to upscale with the same settings as those of the encoder layers. Also, those skip connections that are weaved between every encoder and decoder layer are anticipated to give the decoder’s performance a boost. They allow extra environmental features from each encoder level to flow through, compensating for any potential loss due to compression.

In contrast, the discriminator, , part estimates the probability that the generated VREM from the source data is real, , or not. To achieve this, is formulated as a typical image classifier comprising six layers of progressively shrinking 3D convolution layers. This classifier takes both the generated VREM and the original VREM as inputs. The LeakyReLU activation function is employed after each convolutional layer, except for the final layer, which uses the sigmoid activation function, ensuring real-value generation and model stability. The sigmoid layer produces a scalar output, indicating whether the input VREM image is an accurate reconstruction or not. Batch normalization with random normal input is applied across all layers of and , excluding the output layer of and the input/output layer of . Consequently, the optimization objective forms a Min-Max strategy, working to enhance the ability of the model to correctly classify estimated VREM and learn from input data. This approach is formally defined as

3.2. Sliced-VREM Construction with Altitude-Aware Spider-UNets

To create a VREM, it is essential to have a model that accounts for how the channel’s characteristics vary with altitude. However, using 3D models for this purpose poses challenges due to high memory usage and difficulties in handling the varying numbers of altitude slices caused by the diverse propagation environment and scattering object randomness. To overcome this, we adopted an alternative approach called sliced-VREM (S-VREM). This involves using altitude-aware 2D UNets networks, similar to the Spider-UNets architecture proposed for medical image segmentation in [46]. This architecture captures altitude dependency by processing a sequence of 2D images in parallel, where each modified 2D UNet focuses on a single-plane image while sharing information to learn altitude-related patterns. This approach, akin to 2.5D REM construction, maintains altitude awareness but does not generate a full VREM without a simple interpolation along the altitude axis.

As it is possible to represent volumetric images as a sequential stack of 2D images, the propagation loss at a given altitude can be learned with modified 2D UNets parallelly processing a single-plane image and sharing information among themselves to capture the altitude dependence. Although this Spider-UNet-like architecture considers slices of VREMs and maintains its altitude awareness, it does not, for example, generate a VREM without considering a simple interpolation along the height/altitude axis, and for that, this model can be also referred to as a partial 3D or 2.5D REM construction.

3.2.1. Architecture

The high-level architecture of our Sliced-VREM methodology is outlined in Figure 6, with detailed components described in Table 2. The network consists of two main pathways, similar to the setup described in reference [46]. The first path entails the parallel stacking of 2D convolutional layer-based autoencoders, UNets. Each U-Net is designed to understand environmental impacts and transmitter locations within a specified geographical region, constructing a 2D REM to represent propagation loss at a particular altitude level. This simultaneous processing of image slices improves efficiency. The encoder part of UNet includes 4 sets of custom layers, each consisting of a pair of 2D convolution and 2D max-pooling layers forming the basic building blocks (BL-enc). Similarly, the decoder part of UNet is built with four custom layers, each including a transposed 2D convolution, a skip connector from the encoder, and a single BL-enc layer.

The second path, depicted by the orange-shaded section in Figure 6, captures interslice correlations within sequential REM image slices. This is facilitated by a memory-based recurrent neural network, specifically the Long Short-Term Memory network (LSTM) in conjunction with the convolutional layer that forms the convolutional LSTM layers (Conv-LSTM). Conv-LSTM layer is used at the center of each UNet stack with bidirectional connection to share spatial features learned at various heights. In both the encoder and decoder part, LeakyReLU activation function is used after each convolutional layer. By accommodating variations across both positive and negative input ranges, this activation function substantially enriches the model’s adeptness in fully representing intricate patterns and subtle intricacies intrinsic to the radio propagation environment. The number of parallel stacked UNets, denoted , plays a crucial role in shaping the depth of the VREM and determining the maximum sequence size the model can manage. As the number of stacked U-Nets increases, so does the complexity of model training. To ensure efficient and effective training, a stack length of 3 or 5 has been chosen. This implies that three or five consecutive REMs are simultaneously constructed and consolidated to form the VREM. This approach strikes a balance between complexity and training performance, facilitating the creation of a comprehensive and accurate VREM representation.

4. Experiment and Results

4.1. Addis Dataset Description

To construct VREMs based on 3D radio environmental data, we generated and utilized a new dataset called the Addis Dataset. This comprehensive dataset features 54,000 samples, each representing a 512 m × 512 m  45 m volumetric space. The data were collected across diverse terrains and building types, capturing the resulting variations in signal propagation conditions. This rich diversity equips the dataset to generalize robustly to a wide range of environmental conditions.

The altitude range for GBS or hovering ABS to GUE links is from ground level to 45 m. This height is determined by adding 5 m to the maximum building height in the geographical area, ensuring comprehensive coverage. For GBS-to-low-altitude AUE communication, the altitude range is from 75 to 90 m, which accommodates the practical flight heights of AUE. The actual radio propagation is simulated using WinProp from the Feko 2021.1 suite. This software uses a UDP model to generate the REM with a resolution of 3 meters per iteration.

A set of simulation parameters used to calculate propagation loss and generate REM in WinProp is provided in Table 3. Each REM with a grid resolution of 1 m × 1 m is simulated for a particular geographical area, transmitter location, and UAV-assisted network scenario. As the simulation is repeated for 15 different altitude levels, it is possible to stack them and create a VREM.

4.2. Model Training

Our proposed models, Vol2Vol VREM and sliced-VREM, were efficiently implemented in TensorFlow and trained on the EthERNet high-performance computing cluster (https://hpc.ethernet.edu.et/). This cluster comprises 20 nodes, each equipped with 40 CPU cores and approximately 185 GB of memory. To optimize memory usage and ensure efficient training, the input images (environment maps and transmitter locations) were downsampled from 512 × 512 to 128 × 128 resolution. Both models further benefited from an early stopping technique, halting training before 80 consecutive epochs without improving validation loss, optimizing resource utilization, or improving model generalizability. In addition, training, validation, and testing sets were created using a fixed seed, ensuring complete separation and consistent reproducibility.

The selection of loss functions for both models prioritizes computational efficiency. For that aspect, the generator of the 3D-GAN network and the Spider-UNet networks utilized the Mean Absolute Error (MAE) loss function. This choice effectively achieves accurate REM slice or volumetric representation without the added complexity of perceptual loss functions. For the discriminator part of the 3D-GAN network, the standard binary cross-entropy loss is chosen to identify the realness of the generated VREM.

Training 3D GANs presents a unique challenge due to their sharp gradient space, leading to potential model instability. We explored two solutions: gradient penalties with a regularization value of 10 and weight-clipping. While gradient penalties offered stability benefits, their computational overhead significantly lengthened training time (more than 4 hours). Therefore, we opted for clipping the discriminator’s weights to 30, combined with a zero-mean normal initializer with a standard deviation of 0.02. This effectively stabilized training while maintaining efficiency. Both models were trained using the Adam optimizer with the following parameters: . In addition, dropout regularization with a probability of 0.2 was applied after each building block in the generator.

The training parameters summarized in Table 3 optimized training for Vol2Vol and sliced-VREM, laying a robust foundation for VREM construction performance.

4.3. VREM Construction Performance

In this section, we present the VREM construction results and performance results of the two proposed approaches. We primarily evaluate their performance through two key comparison metrics.

For measuring the pixel-wise accuracy, the mean absolute error (MAE or L1 loss) is used aswhere and represent the actual and predicted propagation loss in every pixel of the VREM image for pixels.

The other metric considered is the Structural Similarity Index (SSIM), which assesses the structural similarity between the constructed and real VREM images, considering its luminance , contrast , and structure . The overall index is given aswhere are the local means, standard deviations, and cross-covariance for the constructed and real REM images. The constants are there to stabilize the quotation at low luminance and contrast regions of the images. If the following common assumptions are taken as and , the equation (11) can then be simplified to

The result of the Vol2Vol approach to the construction of VREM is presented in Figure 7. For a randomly selected communication link, i.e., for the GBS-GUE communication link, the comparison from the ground truth is done both at different heights and as a whole. The model was able to learn the impact of the environment on signal propagation, as shown in the results. However, Figure 7(a) reveals that the model has difficulty capturing the extended impact of narrow buildings that are located farther away from the transmitter, particularly at lower altitudes (as indicated by the red arrows). This is because the model is not able to account for the long-range propagation effects of these buildings. However, at higher altitudes, the model accurately captures the propagation loss at all altitudes, as seen in Figure 7(a). This trend is further underscored by the SSIM values in Table 4, which measure image quality at different altitudes. The higher SSIM values at higher altitudes (approaching 1, indicating perfect similarity) directly confirm the model’s improved accuracy in those regions.

For the sliced-VREM (n = 5-stack) approach, the constructed VREM at different altitudes is illustrated in Figure 8, where the estimated map is pictorially compared to the ground truth. The 5-stack model refers to five UNets simultaneously learning the propagation channel characteristics from five consecutive REMs. In this case, the VREM that is generated at once will have 15 m (5  3 m resolution) of height.

As mentioned in Section 3.2, it is straightforward to infer that the number of parallelly stacked UNets in the sliced-VREM approach governs the trade-off between the complexity and depth of the constructed VREM. If the number of stacks increases to the maximum, , where represents the maximum discretized altitude, one can construct the VREM by taking into account the broad altitude dependency. However, as altitude increases and with the reduced number of scatters between the transmitter and receiver, the channel characteristics would be somewhat deterministic (the model easily captures and) and be dominantly dependent on the GBS-GUE/AUE or ABS-GUE distance. In other words, above a certain altitude, the propagation loss will be less dependent on shadowing and can be modeled by the frequency loss and space loss. With this understanding, it might be possible to limit the consideration of higher altitudes and extrapolate beyond a particular height above ground level, as shown in Table 4. On the other hand, with a smaller number of stacks for the sliced-VREM approach, the complexity and memory needed for map construction will decrease, which also results in a shorter reconstructed volumetric height (stack) of the map. When sliced-VREM takes only a single layer, it will be similar to 2D REM constructing models in [34, 35].

Tables 4 and 5 summarize the performance comparison of our proposed approaches. At lower altitudes, Vol2Vol-VREM exhibits slight constructed image quality degradation compared to sliced-VREM, as evidenced by Table 4. This can be attributed to the inherent limitations of the 3D GAN in capturing finer details and sharper edges. However, as altitude increases, the deterministic nature of propagation loss plays to Vol2Vol-VREM’s advantage. Its direct modeling of signal propagation allows it to excel over the sliced approach, resulting in superior performance at higher altitudes. This superiority is further confirmed by Table 5, where Vol2Vol-VREM achieves a consistently lower MAE than Sliced-VREM, solidifying its accuracy despite the increased complexity of training and constructing large-volume VREMs.

Although a related approach for one-to-one comparison on VREM construction could not be found, we compare the richness of the Addis dataset for deep learning approaches in [34, 35]. As shown in Table 6, while the 2D REM models perform comparably with the proposed VREM approaches, their lack of extraction of the impact of altitude dependency on propagation loss from almost identical maps will hinder their performance.

5. Conclusions

Future wireless networks are evolving to include flexible and 3D network deployments. In these networks, it is critical to have complete knowledge of the radio environment in a large geographical area. However, due to the randomness associated with the propagation environment, attaining volumetric propagation environment awareness is challenging.

To this end, this paper presents a new concept of VREM and proposes two deep learning-based techniques for constructing maps using transmitter location and 3D geographical map information, including terrain and building data. The models were trained using a large REM dataset that was carefully prepared by considering real-world environmental data and different GBS-GUE/AUE-ABS communication links. The results demonstrated that these models can accurately capture the impact of obstacles and the distance between the transmitter and the receiver on the propagation loss to provide precise VREM.

This work opens doors to several exciting future directions. Firstly, extending its application to generate volumetric spectrum maps for network optimization and resource allocation holds immense potential. This would empower real-time network adaptations based on intricate spatial signal variations, enhancing efficiency and performance. Secondly, augmenting the training data with additional prior information represents a promising avenue for improvement. Leveraging partial channel knowledge, incorporating further antenna parameters, or exploring more complex learning architectures could significantly refine the generated VREMs. Fine-tuning the architecture and optimizing the loss function training procedures also offer significant potential. For instance, incorporating perceptual loss functions like SSIM and implementing progressive training strategies tailored to specific altitude ranges are promising approaches worth exploring. Finally, this work paves the way for expansion to other volumetric radio maps beyond VREMs. Generating volumetric channel state information maps and volumetric interference maps could unlock further optimization opportunities for wireless networks. These maps would enable deeper insights into signal behavior and empower real-time interference management, ultimately leading to enhanced network performance and user experience.

Data Availability

The dataset is available at https://codeocean.com/capsule/7136637/tree and access will be granted upon request to the corresponding author.

Disclosure

This work was partially funded as a part of Ph.D. work.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work funded by the Addis Ababa University.