Abstract

A wireless sensor network is a sensing system composed of a few or thousands of sensor nodes. These nodes, however, are powered by internal batteries, which cannot be recharged or replaced, and have a limited lifespan. Traditional two-tier networks with one sink node are thus vulnerable to communication gaps caused by nodes dying when their battery power is depleted. In such cases, some nodes are disconnected with the sink node because intermediary nodes on the transmission path are dead. Energy load balancing is a technique for extending the lifespan of node batteries, thus preventing communication gaps and extending the network lifespan. However, while energy conservation is important, strategies that make the best use of available energy are also important. To decrease transmission energy cost and prolong network lifespan, a three-tier wireless sensor network is proposed, in which the first level is the sink node and the third-level nodes communicate with the sink node via the service sites on the second level. Moreover, this study aims to minimize the number of service sites to decrease the construction cost. Statistical evaluation criteria are used as benchmarks to compare traditional methods and the proposed method in the simulations.

1. Introduction

Wireless sensor networks (WSNs) are spatially distributed autonomous sensors used to monitor physical or environmental conditions, such as pressure, sound, and temperature. WSNs are composed of common sensor nodes and sink nodes [1, 2]; the common sensor nodes cooperatively pass their data through the network to a sink node. The development of wireless sensor networks was originally motivated by military applications such as remote sensing or data collection in dangerous or remote environments [3]. Today, these networks are used in many industrial and consumer applications and have become part of daily life. WSNs are built of a few to several hundreds or even thousands of nodes, where each node can connect with one or more sensors. Each sensor node is equipped with several parts, namely, a transceiver, a sensing device, and an energy source. These sensor nodes differ in size and cost, which results in corresponding constraints on resources such as energy, memory, and computational speed [47]. Their energy source is usually a battery, which is undesirable and infeasible to replace or recharge [810]. Therefore, network lifespan becomes a vital concern in the construction of a WSN [11]. However, unbalanced energy consumption between inner nodes (the nodes close to the sink node) and outer nodes (the node far away from sink node) always occurs and is uncontrolled in two-tier network structures. Sink nodes, the only nodes that control and operate as processing centers, collect all the valuable packages from the sensor nodes via a predefined routing path. The inner nodes not only transfer their own sensed data, but also pass on data from outer nodes. Thus, inner nodes have greater energy consumption than that of outer nodes. The more energy one node uses, the earlier it depletes its battery. The worst case scenario resulting from this is if the depleted node is the only communication line between outer nodes and the sink node. In this network structure, if even a few inner nodes die, many outer nodes will be affected. In this situation, several service sites which have part of the functions of a sink node become necessary, and the sensor nodes then send their data to the nearest service site instead of the sink node. This also decreases the workload on inner nodes and extends the lifespan of the overall network. This paper focuses on developing a method to determine the optimal number of service sites for a given network. The cost of deployment and construction of a service site is much greater than that of a common sensor; thus, there should be a minimum necessary number of service sites in the network to satisfy full coverage demand.

Given nodes with specified distances, centers must be constructed for groups of nodes in such a way as to minimize the maximum distance between nodes and their centers. This is the -center problem. The goal of this paper is to minimize the number of service sites in a wireless sensor network, thus reducing the construction cost of a three-tier network caused by service sites. More importantly, this three-tier network must satisfy the full coverage requirement. The number of service sites is considered in the -center problem. However, is not yet known. One of the most popular methods for resolving the -center problem is the farthest first method [12]; although this method satisfies a 2-approximation solution, it is not perfect. This paper proposes a new scheme, HHSG, to solve the service site problem. The name of “HHSG” was given by an integrated abbreviation of “Huffman coding,” “Hilbert curve,” “Sudoku puzzle,” and “genetic algorithm” because the concepts of these four classical terms were utilized in our proposed scheme. Furthermore, several other methods are simulated and applied to wireless sensor networks.

The remainder of this paper is structured as follows: Section 2 reviews background work on Hilbert curves, the -center problem, and wireless sensor networks and will also describe related work on basic genetic algorithms and Sudoku and Huffman codes. Section 3 describes the HHSG process in detail. Experimental results and some analysis with other methods are given in Section 4. Conclusion is offered in Section 5.

Wireless sensor networks have been widely used in vast variety of different fields. Driven by microelectromechanical systems technology advances in low-cost networking, there have been rapid development and use of wireless sensor networks in recent years [13, 14]. These sensor networks carry the promise of significantly improving and expanding the quality of care across a wide range of applications, which include air pollution monitoring, medicine and public health, and natural disaster prevention. Although a general two-tier network is considered to be a flat network and has a very simple structure, it has an inherent disadvantage in terms of balancing the workload of its sensor nodes. When inner nodes deplete their batteries, they die and disconnect from their outer nodes, interrupting the routing path from the outer nodes to a sink node. As a result, many nodes that still have sufficient energy to function will be removed from the network, and their information will no longer be forwarded to a sink node. Alternatively, a hierarchical network is a network in which all sensor nodes are clustered through some specific technique according to given protocols [15, 16]. Hierarchical networks facilitate equalized power consumption.

2.1. Genetic Algorithms

Genetic algorithms are a family of computational models inspired by natural evolution [1720]. In a genetic algorithm, a population of candidate solutions to a problem is evolved toward better solutions. Each candidate solution, which is expressed in binary string of 0 or 1, is a chromosome with a set of attributes which can be mutated and modified. The basic genetic algorithm usually starts by generating several random chromosome solutions, then evaluating each chromosome, and storing the ones with better fitness values as the algorithm approaches an optimal solution by randomly mutating and altering the predefined number of genes to generate a new solution. This new solution will be used in the next iteration. Commonly, the algorithm stops when it reaches a predefined number of iterations or time limit or when there is one solution that is satisfied. Genetic algorithm is widely used in many applications and is also combined with other methods to generate new optimal solutions [21, 22].

2.2. Space-Filling Curve

A space-filling curve is a single one-dimensional curve that tours around an entire 2 or more dimensional space and recursively fills up all points when the number of iterations approaches infinity [23, 24]. Because Giuseppe Peano (1858–1932) was the first to discover one of the filling curve constructions, space-filling curves in 2-dimensional planes are sometimes called Peano curves. Some of the most celebrated are the Hilbert curve and the Sierpiński curve [23]. Space-filling curves are used in many fields. In 2014, Yan and Mostofi [24] scheduled a data collection path for mobile robots using space-filling curves; his goal was to minimize the total energy consumption, including the communication cost between the robot and sensors and the motion cost of the robot. In this study [25], the problem of how mobile sinks should move is addressed. A good strategy for a moving trajectory for mobile sinks can reduce data loss and delivery delay, increase network lifetime, and enable better handling of sparse networks. A dynamic Hilbert curve is used to design a trajectory for a mobile sink while achieving efficient network coverage. The dynamic curve order varies with node densities in a network. Simulation results show the effectiveness of network coverage and scalability.

For Hilbert curves, if there is a point within the unit square, with coordinates (, ), is the distance along the curve from the start till it reaches that point. Points from the curve that have nearby values will also have nearby coordinate (, ) values. The basic level one (also called first order) Hilbert trajectory is a 2 × 2 grid. The method of recursively constructing a Hilbert filling curve is described as follows: dividing the network field into 4 small grid cells, the one-level Hilbert curve will be the line passing through the centers of those four-grid cells in a specific order of points. To derive a two-level curve, it simply replaces each small grid cell with a one-level curve which may be appropriately rotated and reflected. And -level curve is derived from an -level curve. Intuitively, the higher the level of the curve is, the more accurate its localization precision will be. However, this means that more space is needed for recording the positions, at greater cost. Figure 1 shows one-, two-, and three-order Hilbert curves. There are 4 points in a one-level Hilbert, 16 points in a two-level Hilbert, and 64 points in a three-level Hilbert. is the equation used to compute the relationship between the level of Hilbert curves and the number of points, where is Hilbert level and is number of points.

2.3. Sudoku

Sudoku is a logic-based, number-placement puzzle which consists of grid of blocks, where smaller cells of each element are partitioned. The numeric values 1 to appear uniquely in each row and column of the grid and in each block [26, 27]. Given a grid, the goal is to fill this grid with digits from one to nine only. The rule is that each row and each column, even the nine subgrids which compose the big grid, should contain all of the digits of one to nine. Although the grid is by far the most commonly used, many other variations exist. Number placement could be with regions or with regions.

2.4. The -Center Problem

One of the well-known fundamental facility location strategies [28] is the solution of -center problem, and this problem is known to be NP-hard [29]. The basic -center problem starts from a given graph with vertexes, where it is required to put facilities into the graph, so as to narrow down the maximum distance from any vertex to the facility to which it is assigned. Several optimal algorithms that can achieve a factor of 2-approximation performance have been proposed for it. An algorithm could be called -approximation algorithm which means that the algorithm can always output a value in polynomial times, where the value is no more than times the optimal for a minimization problem. With the widely used -center problem, some variant versions of it have also been massively explored. For example, some special constraints on the centers positions were added to the problem. In 2015, Du et al. [30] explored the incremental one that all the centers should lie on the boundary of a convex polygon. In the same year, Liang et al. [31] addressed the constraint of vertexes with internal connectedness, where it is guaranteed that any two nodes in one set should be lined by an internal path. This is actually a classic -center problem, which is called connected -center (CkC) problem. In [32], the authors presented a solution for the -center problem and did some research about its generalizations. He also noted that dominating set problem is another specific form of -center problem. The authors Chechik and Peleg [33] studied the other constrained version of capacitated -center problem and examined the fault tolerance in failures of one or more centers simultaneously and then proposed methods to address the problem.

3. HHSG Scheme Implementation

The flowchart of the HHSG scheme is shown in Figure 2. The process of this scheme focuses on selecting k service sites out of sensor nodes. Every sensor node will be assigned two-kinds of serial number. One is the node numbering (NID), which is nonrepeatable. The NID ranges from 1 to . The other number is a Huffman code (). It is reasonable that there can be more than one node with the same Huffman code.

The process runs as follows. First, encode sensor nodes using Huffman code and define the level of Hilbert filling curve used. Second, pick the appropriate size and order of Sudoku according to the communication radius and network scale, randomly select a digit from the Sudoku grid, and record and encode the positions () of the digit into . Third, mark that have Huffman codes that are the same as or similar to , and initialize a chromosome using . Fourth, repeat steps 2 to 3 until chromosome initializing is complete. Fifth, find the best solution by executing the mutation or crossover operation to output the outcome.

3.1. Encoding and Defining Curve Order

Huffman code uses a prefix-free code that is a bit string representing some particular points but is never a prefix of any other points. As shown in Figure 3, each position with specific distance d or multiple times d distances starts from the red point (Figure 3(b)), signifying a Huffman code, and this code expresses one or more real sensor nodes in wireless sensor network. By utilizing its locality property, it needs a six-bit string to represent a three-level Hilbert curve. The encoding process is given below.

Divide this field into four small grid cells, and set a two-bit binary number to it; its order is from upper left to lower left and then lower right to upper right, with values of 00, 01, 10, and 11, respectively. This coding order also strictly applies to the inner subgrid cell. As showed in Figure 3(a), binary 10 is in red on the lower right, and all the sensor nodes located in this quarter of the area will be prefixed with a two-bit code 10. Then, this quarter area is also divided into four small grid cells, and the binary number order is exactly the same as that of the bigger area. Binary 10 is in blue and is 1/16th of the total area. Any node in this area will have 10 in the second part of its Huffman code. As shown in Figure 3(b), the red point starts from 0 in the Huffman code to the blue point 47. The binary string for 0 of the red starting point is “00 00 00.” The encoding process for the blue point is described here in detail. The first 2-bit binary string is 10 as it appears in the lower-right quadrant. Then the second 2-bit binary string relies on the upper-right 1/16th area, which is 11. The point located in lower-left 1/64th area results in a 01 suffix. Assemble the binary string 10 11 01, which is 47 in decimal numerals. Finally, the Hilbert curve gives every node an [3436].

of one node varies according to the order of the Hilbert curve. The largest code number is 63 for a three-order curve and 1023 for a five-order curve. Assume a 100-node network with a five-order Hilbert curve. Almost less than ten percent of codes are truly used in sensors, which is a great waste. Similarly, for a 600-node network in a three-order curve, more than ten sensor nodes have the same . Figure 4 shows the configurations of nodes with in varied order of Hilbert curve.

100 to 400 nodes randomly scattered in a 200 × 200 unit area, and nodes are coded in Huffman code with four- or five-order Hilbert curves.

A filling curve has a locality property which means that any two close points in one-dimensional space are mapped to two points that are close in the original 2 (or more) dimensional space, but the converse cannot always be true. There are points where the coordinates are close but their values are far apart, which means that two close nodes may not close in the curve. Thus, if one selects the nodes from this curve directly to initialize chromosomes, there may be neighbor nodes in one chromosome. This is why the Sudoku is needed.

3.2. Sudoku Size and Order

This section will demonstrate how size and order of Sudoku are chosen. The size of Sudoku expresses the value of grids in one row or one column. The order of Sudoku gives the level value of a block constructed by several Sudoku of the same size. A single- or multiple-order Sudoku that can sketch the network appropriately is desired, whereby each grid may cover the right amount of sensor nodes. It is not appropriate to use a single-order 9 × 9 grid Sudoku in a 600-node network with a ten-unit sensing radius of 200 × 200 unit network. If the sensor nodes are deployed randomly and uniquely, the practical number of service sites used will be more than 100. However, only nine positions can be chosen at one time to generate the service site candidates. So before choosing the size and order of Sudoku, the number of service sites must be calculated. A one-order and two-order 9 × 9 grid Sudoku resolution are shown in Figure 5.

Each digit randomly chosen from the Sudoku is labeled as a target digit (). Those nodes near the position are potential sites (). One in a 9 × 9 grid Sudoku generates a solution set with nine . In the experiment, more than one are usually used, and a two-order 9 × 9 grid Sudoku generates 36. In the same way, one generates 64 in a two-order 16 × 16 grid Sudoku. will be used in a genetic algorithm initialization, but not all the chromosomes are generated from directly.

3.3. Initialization and Evaluation

An -bit binary string with binary values 0 and 1 represents the structure of a chromosome. The order corresponds exactly to the NID order of sensor nodes, and is the number of sensor nodes in network. This -dimension string exactly expresses the relationship between service sites and common sensor nodes. In this string, value 1 represents a service site, and value 0 represents common nodes. As shown in Figure 6, NID ranges from 1 to , and some of those value 1 bits come from . There also are some extra value 1 bits randomly added. Each chromosome obtains a fitness value based on its fitness function. The best chromosome with the best fitness value is stored as .

3.4. Fitness Function

A well-constructed fitness function may substantially increase the chance of finding a solution. This section presents a new fitness function which includes four parameters. is the number of service sites in the field, which is also the amount of 1 values in one chromosome. The field is divided into cells, and the number of cells in which those service sites are located is the value; here width and length are the size of the field, and radius is the communication range of the sensor nodes. The fitness function is shown as (1). The function indicates the distance between the node and the sink node. The function means the distance between the node and its closest service site. and are the sets of sensor nodes (size is ) and service sites (size is ), respectively. Coefficients , , and are constants.

3.5. Crossover and Mutation Process

Let (mother) and (father) be the parents; after crossover operation, the child is

The mutation process works by inverting a bit value in the chromosome with a small probability. Here, the mutation rate is set as constant 0.02. The crossover and mutation processes are shown in Figures 7 and 8, respectively.

3.6. Remedy Process

As mentioned above, each chromosome represents a candidate solution for service sites versus common nodes in this model. This model should satisfy validity and feasibility demands. In other words, those 1-bit values representing service sites should be able to cover all 0-bit values representing common nodes. If not, this individual must be repaired. The following method is used in this experiment to revise incorrect chromosomes. First, generate the chromosome again if it happens in the initialization step. Second, change those uncovered bits with value 1. Third, list those uncovered value 0 bits, and change one of them to 1 each time, and remove all other bits dominated by it in the list. Repeat the process until the list is empty.

4. Experiment Results

The simulation environment is a 200 × 200 (unit) area, with 100, 200, 300, and 400 sensor nodes scattered randomly with a communication radius of 40, 30, 25, and 20 units, respectively, in the network. Six methods are implemented, including the FF, HL, HD, DO, GA, and HHSG scheme, where FF is the farthest first traversal, HL and HD are the Harel and Koren [37] methods, DO is a heuristic algorithm solving the minimum dominating set problem [38], GA is the original genetic algorithm itself, and HHSG is the proposed scheme.

4.1. Parameters

The order and size of Hilbert and Sudoku values play an important role in generating . Therefore, different parameters used in the experiment produce diverse results. Table 1 shows the final values for the parameters used in this experiment.

DCT records the number of times for distance computation used in the methods’ operation process. The distance could be node to node or node to service site. Table 2 lists the DCT values for FF, HL, and HD, three nonevolutionary algorithms. The values are computed by simple equation, and the practical values may be smaller due to some pruning strategies used in the methods. As shown in Table 2, HL has the lowest DCT value. Here, the times of HL, FF, and HD are set as three baselines labeled , , and , to be used later.

4.2. Influence of the Hilbert and Sudoku

In the experiment, HHSG was executed on a 200-node network with different Hilbert curve and Sudoku parameters. In Table 3, the consecutive numbers show the Hilbert curve order, Sudoku size, and Sudoku order parameters used in the experiment. For example, 5-16-1 represents that the test operates on a 5-order Hilbert curve with a one-order 16 × 16 grid Sudoku. Column one lists the DCT level. HHSG may produce different results with different parameters. With a five-order Hilbert curve, the results are clearly better than the other two cases.

4.3. Comparison of GA and HHSG

GA and HHSG belong to a larger class of evolutionary algorithms. They may generate high quality solutions by operating endless iterations for optimization. For further comparison of the rate of evolution, the following tests were made. In Table 4, GA and HHSG were run in 100-node to 400-node networks with the same number of iterations. In the 100-node network, HHSG only used 13 service sites to cover the field, two less than GA, and its superiority is obvious in a 400-node network.

In order to test the stability of the HHSG and GA methods, the two methods were run sixteen times in the same situation, with the exception of the random number used. This study lists the standard deviation (SD), best value (best), average (AG), and the worst value (worst) for comparison in Table 5. The standard deviation values stay below one for the HHSG method, where the GA method reaches seven in a 400-node network, which fully illustrates the stability of the HHSG scheme. From the table, it can be seen that the worst result of HHSG is still better than the GA method in 200-node, 300-node, and 400-node networks.

4.4. Overall Evaluation

Tables 6 and 7 list results for the number of service sites and fitness values obtained by the six different methods. In the experiment, the larger the fitness value the better, and the lower the number of service sites the better. Although the number of service sites result by HL is equal to HHSG, the HL fitness value in a 400-node network is lower. The HHSG scheme is better than HL in other networks sizes overall. As for the other methods, they yield lower results overall than those of HHSG in both number of service sites and fitness values. In Figure 9, HHSG1 plots the fitness value evolution process for a 100-node network using the HHSG scheme, HHSG represents the same for a 200-node network, and the processes for the other schemes are labeled accordingly. The superiority of HHSG is clear.

Figures 1013 show the simulated network after service sites are added. Sensor nodes are randomly scattered in the field, with the sink node deployed in the center of area. The sink node is shown in the 100-node and 200-node networks, but it is too complex to plot all the lines between service sites to sink nodes in the 300-node and 400-node networks.

5. Conclusion

Energy load balancing is critical to extending the lifespan of wireless sensor networks, in addition to ensuring continued functionality and avoiding communication interruptions caused by dead nodes. Wireless sensor networks usually consist of hundreds or even thousands of sensor nodes scattered randomly in adverse, remote, or dangerous environments, with only nonrechargeable, nonreplaceable batteries to power each node. Thus, energy conservation for individual nodes is important, but equally important is the energy efficiency of the overall network. Traditional two-tier networks with one sink are vulnerable to energy holes, which cut off many nodes from the sink when one inner node dies.

This paper therefore proposes a solution, HHSG, to minimize the construction cost of a three-tier network and take full advantage of node energy. A Hilbert curve is scheduled for different sized networks, Huffman codes are assigned to nodes, and chromosomes for a genetic algorithm are initialized using a Sudoku puzzle. Furthermore, five other methods are tested in the experiment for performance comparison with the proposed method. The experiment lists the relationship between Hilbert order and sensor nodes’ Huffman codes and the convergence of results due to the varied Hilbert and Sudoku order. It also compares the service site performance of the other five methods with that of the HHSG algorithm. Importantly, this paper lists the standard deviation (SD), best value (best), average (AG), and the worst value (worst) of each method in order to compare the stability and benefits of each method. The standard deviation values stay below one for the HHSG method, which fully illustrates the stability of the algorithm. Simulation results show the superior performance of the proposed method, which builds a stable three-tier network using fewer service sites than other methods. In terms of the costs of the proposed scheme, because HHSG is a centralized algorithm, the cost could be a communication overhead for collecting global information before executing the algorithm; however, this is also the common cost in all of the centralized algorithms to solve the problem. Other possible costs could be the computation time and memory space. However, centralized algorithms are usually executed in a resource-rich machine, and computing power and memory space are not the most important considerations to solve the problem by a centralized algorithm.

Symbols

:The number of sensor nodes in a network
:The number of chromosomes in a population
:The number of chromosomes generated by
NID:Node numbering, which is nonrepeatable for each node from 1 to
:Huffman code for one sensor node
:The positions of one digit in a Sudoku grid
:Those nodes near to position which are potential sites.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work is partially supported by the Fujian Provincial Natural Science Foundation, China (2017J01730), the Key Project of Fujian Education Department Funds, China (JA15323), and Shenzhen Innovation and Entrepreneurship Project no. GRCK20160826105935160.