Abstract

Data collection and energy consumption are critical concerns in Wireless sensor networks (WSNs). To address these issues, both clustering and routing algorithms are utilized. Therefore, this paper proposes an intelligent energy-efficient data routing scheme for WSNs utilizing a mobile sink (MS) to save energy and prolong network lifetime. The proposed scheme operates in two major modes: configure and operational modes. During the configure mode, a novel clustering mechanism is applied once, and a prescheduling cluster head (CH) selection is introduced to ensure uniform energy expenditure among sensor nodes (SNs). The scheduling technique selects successive CHs for each cluster throughout the WSNs’ lifetime rounds, managed at the base station (BS) to minimize SN energy consumption. In the operational mode, two main objectives are achieved: sensing and gathering data by each CH with minimal message overhead, and establishing an optimal path for the MS using the genetic algorithm. Finally, the MS uploads the gathered data to the BS. Extensive simulations are conducted to verify the efficiency of the proposed scheme in terms of stability period, network lifetime, average energy consumption, data transmission latency, message overhead, and throughput. The results demonstrate that the proposed scheme outperforms the most recent state-of-the-art methods significantly. The results are substantiated through statistical validation via hypothesis testing utilizing ANOVA, as well as post hoc analysis.

1. Introduction

Wireless sensor networks (WSNs) are made up of several resource-constrained sensor nodes (SNs). The SNs are used to sense the physical environment around them and transmit their sensory data to the base station (BS) [1]. Unlike SNs, the BS has substantial resources. So, it gathers data, analyze it, and then sends the useful information via the internet to the cloud server or end user [2]. Nowadays, WSNs technology has became smart and can be exploited for different functionalities: intercommunication, decision-making ability, military surveillance networks, tracking the objects’ movements and their speeds, and monitor critical circumstances, such as temperature, humidity, and pressure [36]. In Jain et al.’s [7] study, the authors survey hierarchical routing protocols in WSNs with mobile sink (MS), focusing on event-driven and query-driven scenarios. They discuss the challenges of sink mobility and the need for tailored routing protocols based on application requirements. The paper provides a comparative analysis of these protocols, highlighting their functionalities, advantages, and performance parameters.

One of the main protective aspects in the WSN architecture of SNs is to avoid the probable breach of cyber criminals. Attackers may try to amend the behavioral pattern of normal SNs through different types of attacks: eavesdropping, the node capture, and spoofing attacks. So, securing the routed aggregated data from SNs to the sink is one of the most imperative issues. Generally, the sensors of wireless network may expose to an adversary breaches by sensing the wireless channel and capturing the data being transferred in an unauthorized manner. In Okine et al.’s [8] study, the authors present a novel approach for routing in tactical wireless sensor networks (T-WSNs) used in military operations. These networks face unique challenges like jamming attacks, which disrupt data communication and complicate packet routing. The proposed solution utilizes distributed multiagent deep reinforcement learning to overcome these challenges and find reliable routes while meeting strict delay and energy requirements. It considers factors such as hop count, one-hop delay, packet loss rate, and energy cost in action reward estimation. Comparative analysis shows that the proposed scheme outperforms existing algorithms in terms of packet delivery ratio, packet delivery time, and energy efficiency, making it a promising solution for routing in T-WSNs under jamming attacks.

However, WSNs are often established in an inaccessible environment, and the SNs suffer from the batteries limitation which cannot be charged or replaced [9, 10]. According to these limitations, energy preservation is a vital aspect in the creation of an effective routing protocol. In addition, the extending of lifetime of the WSNs are precisely associated with the SN’s battery life [11]. For the design of an effective routing protocol, several goals should be achieved: minimize the consumed energy, maximize the packets delivery ratio, enhance throughput, extending network lifetime, and decline computational overhead. It is recognized that creating routing protocol is based on two aspects: an efficient clustering method and using MS to communicate to the BS. The achievement of the aforementioned goals is significantly impacted by these factors [12]. So, several research has been performed on the energy saving of the SNs [13, 14].

The present energy-saving methods partition the entire network area into distinct clusters. Each cluster has a cluster head (CH) [15]. The major role of a CH is to gather the sensory data from its cluster members (CMs) and send it to the BS, which considerably expands the lifetime of the networks. However, the main drawback of the applied clustering techniques is caused by the SNs nearer to the CH. These SNs send more packets than the distant SNs. So, they are exposed to the premature death. This problem is resolved by two strategies: first repartition the observing area into several separated network segments that may not be able to communicate with the BS causing poor network performance [16]. Second, some clustering-methods turn the CH role on all sensors to distribute the energy consumption among the CMs. But the current approaches need massive overhead messages which causes transmission delays and affects the networks performance. The presented approaches have been performed using heavyweight work to resolve routing problems (e.g., heuristic and meta-heuristic-based routing algorithms).

These approaches are designed without considering the mentioned limitations [17]. Moreover, some of the approaches may not be scalable with respect to the network size. So, the presented approaches suffer when applied in real-environment WSNs [18]. Recently, several approaches have emerged employing MS for data gathering from the deployed SNs [17]. Next, MS delivers the gathered data to the sink. This leads to decreases the energy consumption of SNs and expands the lifetime of the WSNs [19]. MS-based approaches can be classified into two types [20]. In the first type, MS passes to each SN, gathering its data, and finally send the gathered data of the whole SNs to BS. This strategy reduces the energy consumption of SNs and balances the energy utilization among them. But the data collected from each SN causes great data gathering latency and leads to buffer overflow within each SN. The second type employs rendezvous points (RVPs) strategy to overcome the first type of problem. The RVPs are positioned within the WSNs such that the MS visits these positions to acquire the data from SNs/CHs. However, this strategy suffers from the massive message overhead (MO) to find RVPs within the WSNs. Moreover, it does not consider energy balancing among the SNs, which causes the premature death of the WSNs.

The proposed work’s significant contribution can be summed up as follows:(1)This paper presents an intelligent energy-efficient data routing scheme for WSNs utilizing MS which aims to balance the consumed energy through the anticipated operations in the WSNs topology.(2)The proposed work further proposes a minimum data gathering tour for MS that considerably decreases the data collecting time and enhances the overall performance of the networks.(3)Furthermore, the proposed scheme constructs a prescheduling map to make the spent energy by each SN approximately the same as the remaining SNs or as near as to be the same as other SNs.(4)The proposed work introduces a clustering mechanism to divide the monitoring area into equal-size clusters in the configure mode and before enabling the topology operations.(5)In addition, a modified time division multiple access (TDMA) is presented to assign a constant and ordered slot time for each sensor during its sensing operation. Moreover, the CHs operations are distributed among sensors in prescheduled order through configure mode.(6)Additionally, the proposed scheme adds a better mechanism that enables each SN to self-activate as a CH in its order without the use of extra communication messages.(7)The genetic algorithm (GA) is utilized for MS trajectory optimization.(8)Statistical validation of the comparative results is further conducted utilizing ANOVA and subsequent post hoc analysis.

The proposed scheme offers versatile applications across diverse real-world scenarios. In precision agriculture, the scheme optimizes data gathering tours, enabling efficient monitoring of soil conditions and crop health. This aids farmers in making informed decisions regarding irrigation and fertilization. Additionally, in wildlife monitoring applications, the MS navigates through wildlife habitats, collecting data on animal behavior and environmental conditions without causing disruptions. In industrial automation, the scheme enhances efficiency by optimizing data collection in manufacturing processes. It minimizes the time required for gathering critical data, improving decision-making in control systems. For smart cities infrastructure, the MS strategically collects data from SNs in urban environments, optimizing city management systems, and reducing response times. Healthcare monitoring benefits from the scheme’s efficient data collection, ensuring timely and accurate health monitoring in both hospital and remote patient care settings. Furthermore, in environmental monitoring, the scheme’s adaptability allows it to navigate challenging terrains, optimizing data collection for climate and ecological research. Overall, these scenarios highlight the scheme’s broad applicability, showcasing its potential impact in addressing energy efficiency and performance challenges in WSNs across various real-world.

The original source of this paper under different title is available at SSRN [21]. The outstanding paper is organized as follows: Section 2 describes the related work, Section 4 illustrates the proposed scheme in details. Section 5 depicts the performance evaluation and the experimental results. Finally, the paper conclusion is presented in Section 6.

Several approaches are focused on clustering technique and energy-aware routing protocol for MS in WSN [22]. However, most of these approaches have heavyweight processes in resolving WSN clustering and dealing with MS problems. This section offers a brief literature review of the presenting attempts for both clustering and MS-based data gathering. In Huang and Savkin’s [23] study, an unequal-sized cluster-based routing protocol is presented to perform data gathering in WSNs. The proposed protocol tries to balance energy consumption across the network to extend the lifetime of all nodes as much as possible. It allows MS to go along a fixed mobility model to collect the data from the CH. Moreover, this protocol chooses relay CH for the optimum data transmission. However, this protocol also has some issues when the hot spot/energy-hole issue is close to the trajectory of the sink. In Mehto et al.’s [24] study, the authors introduce TARA, an efficient trajectory planning and route adjustment approach designed for WSN-assisted Internet of Things (IoT) environments. It addresses the challenge of efficient data transmission and collection in WSNs, where IoT devices face resource constraints such as limited energy, computing capabilities, and storage availability. TARA divides the deployment region into a uniform grid, identifying optimal rendezvous grid-cells for MS data collection. It reduces energy consumption by and delay by compared to state-of-the-art techniques.

Wen et al. [25] suggested an energy-aware path construction (EAPC) algorithm for WSNs. The algorithm selects a few RVPs within the network and builds a route between them. At those points, MS gathers huge amounts of data. EAPC is planned to extend the network lifetime and compute the traveling cost from one point to another. However, this algorithm suffers from excess transmission delays and network partition problems. In Wang and Chen’s [26] study, an efficient path planning scheme for MS data gathering in WSNs is introduced. It aims to reliably gather data from sensors of diverse sensing rates. This scheme utilizes both hop distance and the amount of data collected by the SNs to select RVPs within the network. This scheme defines many RVPs that considerably maximize the data gathering time and lead to a buffer overflow.

Fu and He [27] presented a balanced inter-cluster and inner-cluster energy (BIIE) algorithm for WSNs. In this method, a local reclustering mechanism is used to balance the energy consumption within each cluster based on the residual energy of its sensors. In addition, the method provides a mechanism to select a very few RVPs to serve several CHs. These preferred RVPs are used to build the MS’s trip path for gathering data. However, the use of very few RVPs leads to maximizing the data gathering time and causing the buffer overflow problem. In Mehto et al.’s [28] study, squirrels search algorithm based rendezvous points selection (SSA-RVPS) is introduced for the MS to reliably acquire data from WSNs. The SSA-RVPS aims to extend the network lifetime by reducing the trajectory length of MS to gather SNs data generation with different rates. However, the SSA-RPS may implement the reselection of RVPs to ensure a balanced energy distribution among SNs. The main drawback of the SSA-RPS is the loss of data due to the buffer overflow. This occurred when RVPs received more data packets than their available buffer space.

Gutam et al. [20] offered an optimal RVPs selection method to construct MS trajectory for data collection in WSNs (ORPSTC). Initially, authors implement the minimal cost spanning tree (MST) to construct the intended clusters. Next, each CH is identified, and the RVP is also selected for each cluster. ORPSTC created an efficient trajectory for the MS using a low-computation geometry algorithm called MS trajectory construction (MSTC). Through the MS tour, virtual-RVPs (VRVPs) are defined for the SNs that have adequate communication range to connect directly to the MS. However, the VRPS communicates their data to the closest RVPs if the MS becomes out of their communication range. Despite the deduced path considering the sequence of the RVPs locations to improve the data collection, the intended path may not be the shortest one.

Agarwal et al. [29] proposed an intelligent data routing technique for WSNs based on MS for data collection. They employ particle swarm optimization (PSO) for the optimal cluster formation. Next, the RVPs are evaluated based on the average of all the XY coordinates of the SNs of each region. Finally, they utilize these RVPs to draw the MS path for data-gathering tour. The main disadvantage of this approach is the evaluation of the RVPs without any consideration for the actual position of the CHs. In addition, the proposed scheme is not appropriate when employing disconnected networks.

Singh et al. [30] suggested a genetic algorithm for sink mobility technique (GA-SMT). This approach partitions the entire network area into various small-sized regions and picks out RVP for each region using the GA process. Moreover, it employs a vast number of messages to handle and manage small-sized regions which maximizes the consumed energy and decreases the network lifetime. In Sahoo et al.’s [31] study, both GA and PSO are merged in a hybrid algorithm known as GAPSO-H. This algorithm is employed for routing on SM and CH selection. SM has been performed by the PSO algorithm but fails to apply the fitness parameters for routing.

In Wang et al.’s [32] study, another hybrid approach based on the PSO, and the GA is utilized to construct a path scheduling technique. This approach is based on the coverage rate of multiple MSs (TSCR-M). In TSCR-M, the RVPs are primarily established by an enhanced PSO algorithm that reflects sensor coverage and overlapped coverage rates. Next, GA is applied to select the most reasonable route for MMSs. However, the GA fails in addressing the permanence period of the network. Gowda and Jayasree [33] offered a group teaching algorithm by using the Bald Eagle (GTA-BE) routing scheme of WSNs. In this methodology, clusters are created through the mean shift clustering method. The new Bald Eagle Search mechanism is employed to select the CHs while RVPs are determined based on the number of transferred data packets and hop distance. In the end, a hybrid neural network is engaged with group teaching algorithm to select the optimal path between SNs and RVPs. The main shortcoming of this algorithm is that it endures massive MO and transmission delay.

Kumar et al. [34] offered ant colony optimization-based MS path determination (ACO-MSPD) scheme for WSNs. This scheme seeks to select the optimal CHs to meet the delay requirements and balance the energy consumption of the SNs. Moreover, It restricted the maximum touring distance of the MS and chosen the number of RVPs that did not surpass the threshold value of the MS tour. However, this scheme endures high computational complexity. Donta et al. [35] presented an extended ACO to construct MS path for event-driven WSNs. In this attempt, the maximum distance of the MS tour is fixed, and the RVPs are selected according to the SNs data generation rate. However, the RVPs selection is performed due to threshold value. In addition, each RVPs selection remaining used and changes only when the SNs data generation rate is changed. The main disadvantage of this approach is the huge amount of time consumed to select RVPs.

Gupta and Saha [36] offered a hybrid meta-heuristic algorithm-based data routing method for WSNs. Both artificial bee colony and differential evolution (ABC-DE) mechanism are employed to balance the energy spending among the CHs. Besides, a MS-based data collection was performed to gather the data from CHs. However, this algorithm suffers from a minimal convergence rate and decline of the network lifetime. Furthermore, the SNs consume very high energy, which reduces the overall performance of the networks. In Raj et al.’s [37] study, the drawback of constructing a consistent and intelligent route for MS is addressed utilizing game theory and improved ACO-based MS route choice and data gathering (GTAC-DG) approach. The MS route is structured applying an ACO-based algorithm employing the selected (RVPs). Though the GTAC-DG algorithm creates a convincing route for MS and constrains the use of multihop data transfer. But the main drawback is the ignoring of the CH selection process. Tables 1 and 2 provide an overview of the related work.

3. Network Model

In the proposed scheme, WSN topology is divided into a set of clusters. Each cluster compromises a number of SNs. As shown in Figure 1, each SN has many features: sensing unit, processing system, and communication system. Sensing unit include sensor device and global positioning system (GPS). Sensor device is employed to sense physical event or phenomenon and then send the gathered information to the CH node. GPS is used to get the required knowledge of location with high accuracy. The processing unit embraces microcontroller unit (MCU) and memory and the required operating system. The MCU executes self-coding process and the necessary computation on the collected data when employed as a CH. Transceiver is a broadband–radio communication system to transmit the gathered data among nodes and their CH, and then between CH and MS. Finally, power unit is a battery to provide energy for SN and optional components. However, the architecture of the SNs suffers from the small size and low power, appropriate locations.

In the following subsections, assumptions related to SNs and MS are presented. Moreover, the energy consumption of the radio model applied in the intended WSN is also declared.

3.1. Network Assumptions

In the proposed work, as shown in Figure 2, SNs are deployed randomly in a given geographical area with width and height . This area is managed by a MS, that is responsible to collect the sensory data from SNs. The following assumptions are made about the WSN under consideration:(1) represents a set of homogeneous SNs deployed over , where battery recharging or replacement for each SN is not probable.(2)Each SN has a 2D position in the region , which is determined after deployment using its built in GPS.(3)Each SN has a transmission range and can communicate with another SN if the Euclidean distance , where:(4)A stationary BS with infinite energy supply collects data from the MS every round.(5)The area is divided into a number of equal-sized clusters. Each cluster has a number of SNs known as CMs with each cluster having a head, called the CH, which serves as a local sink for the SNs of the cluster. The CH is used to aggregate the sensory data from the CMs.(6)An MS with a sufficient amount of battery life and computational capacity is used to gather data from SNs in subsequent rounds. In fact, the MS hovering over the WSN and visiting a selected set of positions called RVPs.(7)The MS is traveling at a constant speed m/s during each round.(8)Based on the MS’s position, the CHs modify their transmission range to fit within the MS’s range.(9)All SNs have the same initial energy joule. If there is not enough power remaining to transmit a packet to the CH, the SN will be deemed dead.

3.2. Energy Model

In this paper, the energy consumption of SNs is assessed using the first-order radio energy model [38]. Normally, the SN’s energy is dissipated during data sensing, processing, transmission, and reception, as well as analog to digital converters. At receivers, the SNs dissipate energy for radio electronics, whereas at transmitters, they dissipate energy for radio electronics and power amplifiers. Let the transmitter or receiver dissipates energy per bit in its circuit. Typically, channel coding, modulation, filtering, and spreading have an impact on . Let and represent, respectively, the energy needed to send a bit over a given distance in free space and a multipath fading channel. The transmission distance threshold is given as follows:

The dissipated energy for transmitting -bit over distance between SNs and is expressed as follows:

The dissipated energy to receive a -bit at a SN is given as follows:

The dissipated energy at a given CH due to aggregating bit from SNs is given as follows:

The dissipated energy at a given consists of three components: receiving, aggregating, and transmitting. Based on Equations (3)–(5), the total consumed energy of a given in each round is given as follows:where and denoting the data payload and the Euclidean distance from a CH to the MS, respectively.

4. The Proposed Scheme

In the envisaged scheme, a significant portion of computations and communications is alleviated from both SNs and CHs, being instead transferred to the MS and BS. This scheme operates in dual modes: configure and operational modes. The configure mode is specifically designed to carry out the necessary calculations operations to reduce the consumed energy when the topology is running. Therefore, the necessary information are available to both BS and MS. On the other hand, the operational mode is employed to achieve two main objectives: performing the minimum mandatory operations for topology running and establishment of an optimal route for the MS to upload the gathered data from the CHs.

4.1. Configure Mode

The configure mode is a pivotal aspect of our proposed scheme, where the computational burden is centralized at the BS. This approach ensures that SNs and the MS are not overwhelmed with continuous computational tasks, thereby minimizing the impact on their resources and energy consumption. As illustrated in Figure 3, the configure mode comprises three consecutive processes: cluster construction, TDMA scheduling, and the selection of CHs.

4.1.1. Cluster Construction

This process presents a novel method of portioning the monitoring geographical area into an even number of equal-size clusters, where and denoting width and height of any generated cluster. The BS organizes the generated clusters in rows and columns, where the number of clusters in each row and column must be even. As shown in Figure 4, initially, the BS assumes that the initial height and width of each cluster is calculated based on the transmission range as follows:

and

To ensure that the number of generated clusters is even in each row and column directions, the BS calculates and for each cluster based on Equations (7) and (8). Let be an integer number given as follows:where denotes the floor of . Let be a binary variable defined as follows:

Using Equations (7)–(10), the value of is recalculated as follows:

The term reduces the width of each cluster for making room to add another cluster. Equation (11) guarantees an even number of clusters per row. In similar way, the number of clusters in each column can be adjusted to even number as follows. Let be an integer number given as follows:

Let be a binary variable defined as follows:

Using Equations (8), (12), and (13), the value of is recalculated as follows:

This leads to an even number within each column for constructed clusters. The total number of generated clusters within the area is given as follows:

Its clear that . Let be the set of generated clusters from the cluster construction process. After clusters are formed, start from left to right and up to bottom of the clusters topology, each four adjacent clusters forms a group, as shown in Figure 4. Let be the set of the generated groups, where and denoting the cluster number within group . Each is with SNs. It is clear that:

The vertices meeting point of the 4-clusters within each group represent the group meeting positions (GMPs), where represents the GMP of the group .

4.1.2. TDMA Schedule

Originally, WSNs employ broadcast communication in which multiple devices may emit signals simultaneously, leading to collisions and signal destruction. To address this issue, TDMA is employed as a scheduling algorithm to coordinate a group of SNs in transmitting their data within a predetermined frame. This frame is partitioned into equal time slots, allocating one slot to each SN for transmission [15].

The proposed scheme intended to reduce the sources of energy loss and reducing the gap of residual energy among all SNs in each cluster. As aforementioned, each has SNs deployed in its area, where one of them is selected as a CH as illustrated in the next section. At the BS, a TDMA scheduling is modified to define a frame of equal time slots for each cluster , where and . The number of time slots in each is equivalent to the number of SNs , where each slot is associated with only one SN . This means the frame’s length of each cluster is different according to its number of SNs. On the operational mode and for each round, the SN has one chance to transmit its sensed data to the CH according to its order in the frame . In some cases, the frame length is greater than the number of active SNs in the cluster. This case has happened when some of the SNs within its cluster are dead. In the traditional TDMA scheduling, when the frame length is greater than the number of SNs, some of them take more than one chance to transmit their data in the same round. In such a case, the power of SN is drained quickly. To prevent the SN from sending data more than once, the SN switches to sleep mode once it sent its data within its associated slot and then switches to wakeup mode in the next round. Once the TDMA scheduling frame is formed at the BS, the BS broadcasting to all SNs within at the first round only.

4.1.3. Cluster Head Selection

Up to best literature review, the computation processing to select the optimal CH is generally performed among the SNs of each cluster. So, the SNs are exposed to lose some of a significant energy due to the high overhead communication between them. The proposed scheme introduces a novel approach that breaks the CH selection process for each cluster into two stages. The first stage is implemented initially at the BS and contains the most computation processing of the CHs selection, The second stage is implemented during the operational mode.

In the first stage, the BS executes the CH scheduling Algorithm 1 for all clusters in parallel processing. In this algorithm, for each group and for each cluster , the distance between SN and the GMP is computed as follows:where and denote the positions of the SN and the GMP . The intradistances among the SN and remaining SNs within the cluster is given as follows:where and . For SN , let the total calculated distances for SN which given based on Equations (17) and (18) as follows:Construct the cluster descending order vector cosponsoring to the SNs within the cluster :which orders the SNs within its cluster based on their total distances given in Equation (19). In particular, means the SN with identification has first smallest total distance . means the SN with identification has the second smallest total distance . The order variable within the vector represents the order of SN to work as a CH. Algorithm 1 illustrates the formulation of the CH selection order vector in the first stage. Finally, the BS constructs a message containing the vector and sends it to all SNs within the cluster . For , the SN is selected to be as a CH in the first round if , where , , and . Actually, the scheduled CH Algorithm 1 gives each SN a chance to be a CH at specific round, where a SN is selected as a CH based on the mapping between vector and round number. This relation will be clarified and explained through the sensor’s self-encoding module (SEM) in the operational mode which represents the second stage of the CH selection process.

1: Execute the cluster construction process
2: for Each group do
3:  for Each cluster do
4:   for Each SN do
5:    Calculate using 15
6:    for Each and do
7:      using 16
8:    end for
9:    Set
10:    push the tuple into vector .
11:    Sort the vector in descending order.
12:    for to do
13:     labelling the second item of the pair in as .
14:    end for
15:   end for
16:   Send the vector to cluster .
17:  end for
18: end for
4.2. Operational Mode

This mode is executed in successive globel round . When activated, the network components are waking up to perform the required operations for the data routing trip from each SNs up to BS. This trip is consists of CH activation process and the MS trajectory planning process.

4.2.1. CH Activation Process

Initially and only once, for each , the BS send a set of wakeup messages to all SNs within . This is the first communication message that the SNs have gotten. The wakeup message composed of two items: the TDMA scheduling frame and the CH selection order vector which are formed in the configure mode. The values of and are employed by the SEM of all SNs within to perform the routing operation. In addition, each SEM fortified by a dedicated local round variable that defined as the current round which used to count the number of executed rounds within . Initially, the local round for all SN within the cluster . When the globel round begins (i.e., ), is set to by the SEM of all SNs in .

Based on the vector , the SN elects itself as a CH if . The election process is executed without any communications between the SNs themselves or the SNs and their CHs. As a result, the energy consumption of the network is significantly reduced. As shown in Algorithm 2, the CH selection is carried out during the operational mode as follows:(1)Step 1: If the value of of a given SN is equal to the current round then it employs itself as a CH while remaining SNs within employ themself as CMs.(2)Step 2: After the current CH of collects the sensory data from its CMs and sends it to the MS, it switches itself to be a CM in the next round.(3)Step 3: If the current CH is the last SN within the vector then all SNs within including the CH itself will reset the value of as follows:where is the cardinality of . All SNs are then follow Step 1 again.(4)Step 4: If the current CH is not the last SN within the vector then all SNs within including the CH itself will set increase the value of by and follow Step 1 again.

After reparations of numerous rounds, one or more of SNs may loss the majority amount of its energy. Let be threshold energy that is enough to enable a SN to work as a CH. The remaining energy of the SN at the start of a given round is denoted as . If is less than then the SN sends an announce message to all SNs to tell them about its energy status. Based on this message, the following actions are executed:(1)The SN switches to the sleep mode and its assigned slot changes to unused slot mode. Any slot in unused mode can not be used by the other SNs.(2)All SNs which coming after in the vector will decrease it order CH by .

1: Execute the cluster construction process
2: for each group do
3:  for each cluster do
4:   for to do
5:    for to do
6:     ifthen
7:      Select as CH
8:      break
9:     end if
10:    end for
11:    Aggregates data at
12:    Send data to the MS.
13:    Switch to member state.
14:    ifthen
15:     Set to sleep mode
16:     Set the slot of to unused mode
17:     for to do
18:      Set
19:     end for
20:    end if
21:   end for
22:  end for
23: end for

To make the CH activation process more understandable, we provide the following illustrative example as shown in Figure 5. In this example, we focus on cluster and cluster . Four SNs are deployed in (i.e., and six SNs are deployed (i.e. . The TDMA scheduling frames slots and slots. Initially, globel round , the local round and local round . The vectors and are given as and . For the first round, the SNs and are selected as CHs. For the second round (i.e., , , and ), the SNs and are selected as CHs. This process continues until round , and . In such cases, the SNs and are selected as CHs. At , , and . In such case, the SNs and are selected as CHs and so on.

4.2.2. Path Planning for MS

In the planned topology, the GMPs are defined at the vertices meeting point of every four adjacent clusters. The GMPs are positioned to accomplish three purposes: limiting the transmission range of any cluster’s sensors from exceeding GMP position, constructing the trajectory of MS, and identifying initial locations of the RVPs which represent the MS data collection positions. Originally, an efficient plan of the MS path should minimize total energy utilization. This can be achieved if the transmission ranges of the CHs to MS are also reduced. However, the employment of some GMPs as RVPs positions may not be the appropriate choice for all successive rounds due to the change of the CHs positions. So, some extra points may be needed to replace some of not appropriate GMPs. In the following, these extra points are created according to the need of them.

(1) The RVPs Points Formulation. As mentioned before, MS has infinite energy supply and has processing power like BS. In addition, MS is aware of the necessary information about the formed clusters, their included SNs, and the initial sorting of CHs for all clusters. So, MS employs this information to test the suitability of the presented GMPs to continue as RVPs locations or replaced by more appropriate new locations. Let be the initial RVP locations sequence that the MS follows to collect data from CHs, where . At the start of each round executes the following procedure to update the sequence as follows. For each group, do the following steps.(1)Determine the centroid of the CHs within the group :andwhere is the CH position of the cluster .(2)Compute the sum of distances among the four CHs and the centroid as follows:(3)Compute the sum of distances among CHs and the GMP as follows:where denoting the position of the GMP .(4)If then replace within the sequence with the location .(5)If then keep within the sequence without any change.

Since the sequence is performed by the MS at every round, it value may be changed from round to another.

(2) The MS Path Optimization. The path trajectory starts from the BS, passes through all RVPs , and finally backs to the BS. For this purpose, the optimization GA is employed to select the shortest path among all available path traversing all RVPs. The MS path trajectory algorithm takes the sequence of all RVP as input and gives a path trajectory that traverses all RVPs in a shortest path manner as output. As shown in Algorithm 3, the process of constructing the MS path trajectory proceeds as follows:(1)Divided the set into an arbitrary number of clusters, using the constrained--means algorithm.(2)Apply the GA algorithm to obtain the shortest path within each cluster , where .(3)Connect the resulting subpaths , , , …, to obtain the MS path trajectory .

1: Input: The set , the number of clusters .
2: Output: The MS path trajectory .
3: Divide into clusters using constrained--means algorithm.
4: Initialize .
5: for to do
6:  Apply GA algorithm to generate the optimal sub path of cluster .
7:  Add to .
8: end for
9: for to do
10:  Connect the last RVP of the path to the start of RVP of the path .
11: end for
4.3. Computational Complexity Analysis

For SNs, clusters, groups, four cluster per group and SNs for cluster within group , the time complexity of Algorithm 1 is given as follows. In Step 2, the time complexity is , considering that the number of groups is . In Step 3, the time complexity is , as each group contains only four clusters. The time complexity in Steps 4, 5, 9, 10, and 11 is given as . For Steps 6 and 7, the time complexity is given as . The time complexity of steps from 12 and 13 is given as . Therefore, the time complexity of Algorithm 1 is expressed as . The time complexity of the CH activation in Algorithm 2 is the same as in Algorithm 1.

The time complexity of the MS path trajectory in Algorithm 3 can be determined through the following calculation. The set of RVPs, denoted as , is partitioned into clusters, where , utilizing the constrained--means clustering algorithm in Step 2. Let be the number of RVPs in cluster and . it is clear that , where denotes the number of RVPs in the set , and . The loop (Steps 6–8) in Algorithm 3 iterates through each cluster and employs the GA to determine the shortest path among the RVPs within that specific cluster. Given the input size RVPs, the population size and the number of generations , the time complexity is given in Chatterjee et al.’s [39] and Srinivas and Patnaik’s [40] studies, as . For Steps 9 and 10, the time complexity is given as .

In our proposed scheme, we emphasize the significance of the configure mode, a pivotal aspect that plays a crucial role in the overall system operation. The configure mode is designed to centralize the computational burden at the BS. By adopting this approach, we ensure that SNs and the MS are not overwhelmed with continuous computational tasks. This strategic distribution of computational load minimizes the impact on the resources and energy consumption of SNs and the MS, thereby enhancing the overall efficiency and sustainability of our proposed solution.

5. Experimental Results

In this section, various effective metrics are employed through simulation experiments to assess the performance of the proposed scheme. So, the experiments result of the presented scheme compared with the prevailing state-of-the-art algorithms, such as EEMSR [29], EAPC [25], BIIE [27], GA-SMT [30], and GTA-BE [33]. Both the proposed scheme and the state-of-art algorithms implemented using Python 3.11.0 mounted on Microsoft Windows 10 Pro with Intel Core i7 CPU of 4.7 GHz and 16 GB RAM. The parametric values of the state-of-the-art algorithms used as published in their papers and validating the results by comparing them with their outcome. In addition, the comparison among these algorithms and the offered scheme performed based on a variety of performance metrics such as stability period, network lifetime, average energy consumption, communication MO, data transmission latency (DTL), and throughput. Moreover, the comparison was performed with various node densities. All the SNs randomly distributed in the observing area . The BS placed at location (100, 100). The experiments run 100 times for each metric. The average results of each metric are plotted in a specific graph as in the following subsection. Table 3 reveals the additional simulation parameters considered in the experiments.

5.1. Stability Period

The stability period is measured based on the total number of data gathering rounds performed before any SN’s residual energy reachs zero. According to the implemented methodology, the power consumption by each SN within each cluster is approximately equivalent. As a result, all clusters’ SNs will remain doing their role together for extended period of time. This means the whole network continues functional for longer duration. As a result, the network performance is enhanced as the stability period is prolonged.

Figure 6 presents a comparison of the stability periods versus different numbers of SNs. When utilizing 100 sensors, the stability period of the proposed scheme outperforms EEMSR by , GTA-BE by , BIIE by , GA-SMT by , and EAPC by . However, as the number of employed SNs increases, the results of the proposed scheme exhibit significantly greater improvements compared to the outcomes of the compared algorithms. Specifically, when employing 500 SNs, the stability period of the proposed scheme increases to , , , , and when compared with EEMSR, GTA-BE, BIIE, GA-SMT, and EAPC, respectively. These results are deemed valid because an increase in the number of SNs in the presented scheme leads to a rise in the number of SNs within each cluster. Consequently, each SN will assume the role of a CH after an excessive number of rounds, allowing them to retain residual energy for more rounds. Additionally, CH selection is a self-coding process performed without the need for overhead messages or mutual processing among the cluster’s members. As a result, the proposed scheme is more efficient in avoiding the premature death of SNs and provides extended stability compared to other state-of-the-art algorithms.

5.2. Network Lifetime

Figure 7 illustrates the achieved network lifetime for the proposed scheme compared with EEMSR, GTA-BE, BIIE, GA-SMT, and EAPC, respectively. The X-axis symbolizes the different number of SNs that participate in the execution of each experiment. The Y-axis specifies the overall network lifetime steadiness of individual schemes. For 100 sensors, the simulation results show that the presented scheme enhances the network lifetime up to 0.1, 49.3, 73.18, 74.81, 81.68, and 84.46 when compared with EEMSR, GTA-BE, BIIE, GA-SMT, and EAPC, respectively. When 500 sensors are deployed, the mentioned results are raised to 49.3%, 73.18%, 74.81%, 81.68%, and 84.46% for EEMSR, GTA-BE, BIIE, GA-SMT, and EAPC, respectively. The massive results differences are achieved due to an increase in the amount of energy assigned to the excessive sensors. The recorded results for employing 500 sensors show massive differences as compared to the use of 100 sensors. This improvement is achieved due to an increase in the amount of energy offered by the 500 sensors. In addition, the offered scheme performs three significant operations to enhance the network lifetime: the modified TDMA scheduling, self-coding technique of each SN, and the optimal routing algorithms using a MS. For each round, the modified TDMA limits the time of sensing operation of each SN to one definite slot. This leads to save the energy consumption due to the repeating use of some sensors as a CH more than once in the same round as in the original TDMA. Moreover, the employment of each sensor to be a CH at a specific round is prescheduled in the configure mode. At the start of each round, each sensor executes CH activation algorithm to know its role in the current round, CH or CM. Accordingly, a lot of energy consumed in the compared schemes due to overhead communication among CMs to select CH is avoided. Consequently, both modified TDMA and self-coding algorithm are participated in saving a lot of energies lost in the compared algorithms. Finally, the optimum selection of the RVPs leads to the minimization of the CHs’ distances when transmitting their aggregated data to MS. As a result of the implemented procedures in the mentioned operations, the consumed power is reduced, and hence network lifetime is expanded. So, the proposed scheme is more effective in extending network lifetime than the use of the remaining state-of-the-art algorithms.

5.3. Average Energy Consumption

The average energy consumption of the network evaluated based on the energy spending of each SN every round . To reduce the entire energy consumption, the energy consumption of each SN should be balanced during the network lifetime. The can be deduced as follows:

Figures 8 and 9 depict the comparison of average energy consumption among the proposed scheme with other state-of-the-art algorithms. For 100 SNs, Figure 8 reveals that the suggested scheme reduces the average energy consumption by , , , , and compared to EEMSR, GTA-BE, BIIE, GA-SMT, and EAPC, respectively. For 500 SNs, Figure 9 indicates that the suggested scheme reduces the average energy consumption by , , , , and compared to EEMSR, GTA-BE, BIIE, GA-SMT, and EAPC, respectively.

The increased number of SNs leads to an increase in the amount of energy offered due to excess sensors. According to the energy model presented in Section 3.2, the dissipated energy of each SN is maximized when it plays the role of the CH. According to the scheduled CH Algorithm 1 of the proposed scheme, each SN is given a chance to be a CH at specific round. In addition, this chance does not repeat until the all-remaining SNs are given the same chance. Consequently, the energy consumption of each SN will be saved when its role as a CH is delayed. This is achieved when the numbers of SNs are increased within each cluster. In addition, CH selection is performed based on the self-coding of each SN. In addition, CH selection is performed based on the self-coding of each SN. This means that there are no overhead communication performed due to the CH selection. Avoiding the overhead communications leads to tiny power consumption as compared with the other algorithms. Finally, the optimum selection of the RVPs leads to the minimization of each CH’s distances when transmitting its aggregated data to MS. This distance’s minimization leads to the minimization of power consumption duet to the aggregated data transmission. As a result, the implementation of the proposed method has succeeded in presenting different techniques to avoid a lot of energy consumed when the other compared methods are applied. Hence the proposed scheme is more effective in extending network lifetime than the use of the remaining state-of-the-art algorithms. However, the enhanced energy efficiency is opposed by some increasing in the latency of the routed data. This increase is caused due to increase the time of gathering data from excess sensors to CH then transmitting to MS. However, this drawback can be avoided by exploiting more than MSs to minimize the data routing latency.

5.4. Message Overheads

MO is defined as the number of control messages transmitted among the employed SNs, MS, and BS through the network configuration and operational modes. The amount of this metric should be minimized. The increase of these messages leads to the increase of the collision and power consumption of the deployed SNs. Two communication overhead messages are employed in the proposed scheme. First, in the configure mode, the BS transmits a single message to all SNs contains the TDMA frame and the vector of the CH order for each cluster. Second, in the operational mode, the SN which loses its ability to be a CH sends a single message to all the CMs in its cluster. So, the total number of the overhead messages within the network is reduced as compared by the remaining state-of-the-art algorithms. Figure 10 reveals the comparison between the presented scheme and these state-of-the-art algorithms in terms of the MOs. The MO reduced up to as compared to EEMSR, up to as compared to GTA-BE, up to as compared to BIIE, up to as compared to GA-SMT, and up to as compared to EAPC. This decline occurs due to the one-time network topology established process and scheduling algorithm of the CHs selection. This leads to the minimization of the excessive message exchange in the networks.

5.5. Data Transmission Latency

DTL is the time needed to transmit data from SNs to BS through the MS. It is computed by summation of the overall time mandatory to visit each RVP hovering locations and gathering the data from each CH. So, selecting the optimal number of RVPs leads to reducing the DTL. As a result, the reliability and overall performance of the networks are improved.

Let denoting the MS speed and denoting the MS hovering time at RVP location . The data transmission latency, , consists of two components: (1) the overall MS trajectory time , where and and is the number of clusters generated by the K-mean algorithm. (2) The amount of time due to the MS spent hovering over each RVP location . Consequently, the can be given as follows:Figure 11 illustrates the vast noticeable reduction of the DTL among the proposed scheme and the remaining algorithms. The amount of DTL increases as the number of sensors increases. When 100 sensors are used, the proposed scheme is reduced up to 15.3% as compared EEMSR, up to 72.5% compared to GTA-BE, up to 75.5% compared to BIIE, up to 87.9% compared to GA-SMT, up to 81.5% compared to EAPC. This reduction is achieved due to two main reasons. First, selecting the optimal number of RVPs. Second, the positions of RVPs are designed to allow MS to serve four CHs at each move. In other algorithms, each RVP is selected to serve one CH. So, the proposed scheme overcame the compared algorithms and its MS become able to accomplish the data gathering process in a more convenient time.

According to the aforementioned network model, the intended geographic area is divided into equal-sized clusters based on the range of these similar sensors. In addition, each cluster’s members perform their task independently to other cluster’s members. Thus, the scalable of WSN is simply possible by employing additional number of clusters to cover the extended area. Each of additional clusters will supply by an excess number of required SNs. This scalability may cause an increasing in path trajectory of the MS when move to gather the aggregated data from the CHs of different clusters. The lengthen path may cause bad effect on the delay bound of the collected data. However, this problem can be avoided by assigning more than MS with different path. This mean the whole area is portioned into more than section. Each section has its own MS and each MS has its own path. So, the architecture of the proposed scheme can be designed to be flexible and scaled to large numbers of nodes. This allows an easy adaptation to different environments and tolerating an extensive data collection and monitoring in a wide range of applications.

5.6. Throughput Analysis

The network throughput is characterized as the proportion of the aggregate packets received by the BS to the total number of packets aggregated at the CHs and transmitted to the MS at RVPs [23]. Figure 12 illustrates the mean throughput of the proposed scheme in comparison to other state-of-the-art algorithms. The X-axis represents the total SNs while the Y-axis signifies the overall percentage of the average throughput over 5,000 rounds. From the figure, we note that the proposed scheme exhibits an enhanced average throughput, surpassing EAPC, GA-SMTBIIE, BIIE, GTA-BE, and EEMSR by 60.27%, 54.1%, 45.45%, 36.9%, and 4.9%, respectively. The proposed scheme consistently demonstrates commendable average throughput throughout the majority of rounds. This is deduced due to the equilibrium in energy consumption across each SN over the network’s lifespan that ensures each SN preserves its residual energy for an extended duration.

5.7. Statistical Analysis

The one-way analysis of variance (ANOVA) is a statistical analysis widely employed across diverse fields such as psychology, biology, sociology, and business [41, 42]. It is utilized to ascertain whether there are any statistically significant differences between the means of one or more independent (unrelated) groups [43, 44]. In this paper, we conduct a one-way ANOVA test on the sample data generated by both the proposed algorithm and state-of-the-art algorithms, including EEMSR, GTA-BE, BIIE, GA-SMT, and EAPC. To conduct the test, we have taken into account performance metrics such as stability period, network lifetime, average energy consumption, communication MO, DTL, and throughput. Additionally, we utilize the same parameters as depicted in Table 3.

The ANOVA test determines whether the null hypothesis (which suggests that the means of the given algorithms are the same) can be rejected, thereby accepting the alternative hypothesis (indicating that the means of the algorithms are significantly different), or whether is accepted and is rejected [44]. Let the null hypothesis be as follows:where represents the mean of algorithm . The ANOVA table provides a comprehensive summary of various statistical measures including sources of variation, sum of squares , degrees of freedom , mean squares , F-statistical value (), p-value () and F-critical value (). It delineates three primary sources of variation: between-group variation, which accounts for differences between group means; within-group variation, reflecting differences within each group; and total variation, representing the overall variability in the dataset. The value of significance level is considered as, .

Typically, if the is less than the chosen significance level and the exceeds the , then the null hypothesis in Equation (27) is rejected, and the alternative hypothesis in Equation (28) is accepted. Otherwise, is accepted, and is rejected. Therefore, we encounter the following two conditions:(1)Condition 1: Reject and accept , if and .(2)Condition 2: Accept and reject , if condition 1 is not satisfied.

The ANOVA results, displayed in Table 4, outline the performance of six different algorithms across multiple metrics: stability period, network lifetime, average energy consumption, communication MO, DTL, and throughput. Notably, five of these metrics meet Condition 1, leading to the rejection of the null hypothesis. However, for communication MO, the null hypothesis is accepted. This suggests that there is a significant difference between at least one of the means of the algorithms. However, without further information, it remains unclear which specific algorithm or algorithms contribute to this distinction, highlighting the necessity for Tukey’s honest significant difference (HSD) test [41]. Tables 59 present the results obtained from conducting the Tukey’s HSD test. Upon thorough examination of these tables, it becomes apparent that the “reject” value in the last column for each scenario signifies the rejection of the null hypothesis. This leads us to confidently affirm that our proposed algorithm exhibits statistical significance, showing notable differences from the other algorithms. However, it is paramount to underscore that while the proposed algorithm yields enhancements in throughput and latency by 4.9% and 15.3%, respectively, in comparison with the EEMSR algorithm as delineated in Sections 5.5 and 5.6, this advancement is deemed statistically insignificant, as evidenced by Tukey’s honestly significant difference (HSD) test, as elucidated in Tables 8 and 9.

6. Conclusion

This paper introduces an intelligent energy-efficient data routing scheme for WSNs utilizing MS. The operations of the proposed scheme are performed in two successive modes: configure and operational modes. The configure mode includes cluster construction, TDMA scheduling, and CHs selection. The self-encoding module in each SN is used to pick the CH in operational mode without any communication between the CMs within each cluster. For the MS-based data routing, the optimal number of RVPs is selected, from which the MS’s best path is determined using K-mean clustering and GAs. Simulations and analysis have been implemented to confirm the effectiveness of the proposed scheme. The simulation results guarantee that the offered scheme is more efficiently than the current state-of the-algorithms such as GTA-BE, BIIE, GA-SMT, EAPC, and EEMSR in terms of the stability period, network lifetime, average energy consumption, DTL, MOs, and the average of throughput. The findings are further supported by statistical validation through hypothesis testing and Tukey’s honest significant difference analysis.

The implementation of the proposed scheme may depend on the environment of the intended scenario. For instance, scenarios of monitoring border’s activities or traffic monitoring may need to specified knowledge and proficiency to determine the appropriate locations for the WSN. One of the implications affects these scenarios is interference with signals of other wireless sensor devices. This causes a bad influence on network performance and reliability. In future work, we aim to enhance the proposed scheme to handle scalability and obstacles issues. In addition, we investigate the deployment of multiple MSs simultaneously in large scale to cover different areas.

Data Availability

No underlying data were collected or produced in this study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.