Abstract

Water pollution detection is of great importance in water conservation. In this paper, the water pollution detection problems of the network and of the node in sensor networks are discussed. The detection problems in both cases of the distribution of the monitoring noise being normal and nonnormal are considered. The pollution detection problems are analyzed based on hypothesis testing theory firstly; then, the specific detection algorithms are given. Finally, two implementation examples are given to illustrate how the proposed detection methods are used in the water pollution detection in sensor networks and prove the effectiveness of the proposed detection methods.

1. Introduction

Water is the most important material to human’s survival and valuable resource to industrial and agricultural production. With the development of economy and industry, more kinds of pollution materials are discharged into the water environment such as rivers and lakes, and more water pollution disasters have happened. Detecting the pollution timely is important for water conservation and is the precondition to locate and find the pollution source.

In most pollution monitoring and pollution source localization applications by using sensor networks, the criteria of the pollution detecting are that the nodes have pollution concentration values and the concentration values are larger than a given threshold, such as the works about the pollution monitoring [18] and the works about the pollution source localization [912].

Since there is an initial pollution concentration of normal production and life in water, when the sensor nodes have monitored relevant information, it cannot be deduced that there exists pollution generated by a pollution source. At the same time, in the water environment there are plankton, garbage, aquatic animals, plants, and so forth, which intervene in water pollution monitoring and bring disturbances to the monitoring data. The decision threshold to determine whether there is pollution is difficult to be given properly in the simple source detection method.

In this paper, hypothesis testing is adopted to solve the water pollution detection problems. Firstly, a brief description of the monitoring sensor network and what problems there are in the pollution detection are given. Secondly, theoretical approaches to solve the detection problems are analyzed based on hypothesis testing. Thirdly, the specific detection algorithms are given. Finally, implementation examples are given to illustrate the proposed pollution detection methods.

2. Problem Statement

2.1. Network Deployment

The self-organizing sensor network is used in the water pollution monitoring. (>5) sensor nodes are deployed in the monitoring area uniformly and the detail information of the pollutant to be monitored is known previously. The detection sensors which are stretched into water are identified. The locations of the nodes are fixed. The sensor nodes know their own positions and all static nodes in the network sample and store the concentration values synchronously with the same time interval. The background information such as the diffusion coefficient, the water depth, and the interval of sampling time is known previously. The monitoring information is routed to the sink node and processed by the data processing center. The network deployment is as shown in Figure 1.

2.2. The Detection Problems

The pollution detection problem of the network is to detect whether the sensor network finds the pollution. More specifically, that is, static nodes sample and store the concentrations uniformly with a time interval . At the sampling time , based on the samples of the nodes determine whether there is pollution at a given significance level in hypothesis testing.

The purpose of the pollution detection of the network is to detect the pollution timely.

The pollution detection problem of the node is that each node in the network determines whether it has accessed to the concentration information about the pollution source. More specifically, that is, all static nodes in the network sample and store the concentrations synchronously with a time interval . At a given significance level in hypothesis testing, it is determined whether the node has found the water pollution at based on the known samples .

The change of the diffusion in the concentration field is slow. When the network finds the pollution, it is not that each sensor node has detected the pollution. With time passing, the sensor nodes having detected the pollution will be more and more. In the pollution source localization, a node can be used in the localization only when the node has detected the pollution.

3. Pollution Detection Based on Hypothesis Testing

In [13], a simple discussion about the water pollution detection is given by the present authors under the assumption that the distribution of the monitoring noise is normal and known previously. In this paper, the pollution detection problems are discussed in more general cases.

Assume that the initial pollutant concentration (the pollution concentration of normal production and living sewage) in water is . If there is no diffusion source, . If there is some node having detected the pollution, , where is the measurement noise of sensor nodes and is the theoretical concentration value related to the pollution source.

Remark 1. The concentration changes over time and at different locations. The specific forms of water pollution diffusion can be seen in literature [14].

3.1. Distribution Test

Under different statistical distributions of samples, the specific hypothesis testing problems are different. In the water pollution detection, the first is to determine whether the distribution of the observation noise is normal.

In the initial state, there are only a few nodes perceiving the pollution or there is no node perceiving the pollution. When there is no pollution, , and if the distribution of is normal, is a normal variable.

In order to save cost, the number of sampling nodes is often limited in practical applications. The Shapiro-Wilk test [15] method in the case of small samples can be used as the distribution test method here.

Step 1. Remark the monitoring values of the nodes at as . Order the values as follows:

Step 2. The test statistic is calculated aswhere .

Step 3. At a given significance level of hypothesis testing , if , the distribution is nonnormal; otherwise, the distribution is normal.

In the above steps, the values of and can be obtained by the method of table lookup [16].

The specific detection problems when the distribution of the sensing values is normal are different from the detection problems when the distribution is nonnormal. The detection methods in the two cases are discussed in the following.

3.2. The Pollution Detection of the Network

Case 1 (hypothesis testing under normal distribution). According to the nature of the normal distribution, if the node has detected no pollution at , it can be deduced that (1) , the noise ; (2) if , .
Investigate the mean value of . The hypotheses are given byAs is unknown, the test statistic iswhere and .
At the significance level [17], ifthat is, when reject , it is deduced that there is a pollution source. Note that is the quantile of -distribution [17, 18].

Case 2 (hypothesis testing under nonnormal distribution). While the sample distribution is nonnormal, it is difficult to verify what the specifying distribution of the values is. In this case, the Wilcoxon rank sum test is used directly [17].
Our test problem is to determine whether there is significant difference between the two groups of independent samples in Table 1. The hypotheses areList the data in ascending order and allocate the ranks according to the order. The significance level of hypothesis testing is , and is the sum of ranks of Sample 1.
Whenreject , there is a pollution source in the monitoring area. and are the upper tail value and lower tail value of the two-tailed rank sum test [17, 18].

3.3. The Pollution Source Detection of the Node

Based on the monitoring values of node , determine whether the node has detected the pollution source.

Case 1 (hypothesis testing under normal distribution). If the sample noise is normal, when the node does not detect the pollution at , there is .
Let be the average value of , . The hypotheses are given byThe test statistic iswhere and  .
Ifthat is, when , reject , and it is deduced that the node has detected the pollution. Here, is the quantile of -distribution, and is the significance level of hypothesis testing [17, 18].

Case 2 (hypothesis testing under nonnormal distribution). The Wilcoxon rank sum test is used. Our test problem is to determine whether there is significant difference between the two groups of independent samples in Table 2.
The hypotheses areThe same solving method as hypothesis testing problem (6) can be used in (11).

3.4. Sample Size Requirements in Detection
3.4.1. Basic Requirements

According to the basic sample number requirements of the hypothesis testing methods [17, 18], the basic sample size requirements in our detection methods are given as follows. In the distribution test, the number of samples should be . When the distribution of the sample noise is normal, there should be at least 4 samples in the pollution detection of the network and the pollution detection of the node, so and . When the distribution is not normal, there should be at least 6 samples in the pollution detection of the network and in the pollution detection of the node, so the sample numbers should satisfy and .

3.4.2. The Power of Tests

(A) Test. In hypotheses test under the normal distribution, the OC function of test of (3) and (8) iswhere , is in (4) and in (9), respectively, is in (4) and in (9), respectively, and in (4) and in (9). is in (3) and in (8), respectively.

In the pollution detection of the network, to reduce the cost, the number of nodes is always given previously. So there is a precondition; that is, the number of the sensor nodes is , where is a given number. For the purpose of participating in the pollution source localization timely, there also should not be many sampling times in the pollution detection of the node, and the maximum sampling number is also often given.

To reduce the probability of false alarm, under given thresholds , , in the hypothesis testing problems (3) and (8), the significance level should satisfywhen and .

(B) The Nonparametric Test. There are no explicit expressions of the test power in nonparametric tests. When the maximum number of samples is given, if we want to reduce the possibility of the I type error in the test, the possibility of the II type error often increases [19], so an appropriate significance level is necessary. In nonparametric tests, is often adopted.

4. The Detection Algorithms

Based on the theoretical research above, there are pollution source detection algorithms as follows.

Algorithm 1 (the pollution detection of the network).
Preconditions. The number of providing samples is large enough and known. The samples at the first sampling time and the detection time are known. The parameters and which are related to the test power are given.

Step 1. Use Shapiro-Wilk test to test the distribution of the sample noise according to (2) at the first sampling time .

Step 2. If the distribution is normal, get the value range of the significance level according to (13), choose an any significance level in the range, calculate the test statistic as (4), and go to Step 3. If the distribution is nonnormal, go to Step 4.

Step 3. When the test statistic satisfies the test criterion (5), there is pollution; otherwise, there is no pollution.

Step 4. List the data in ascending order and allocate the ranks according to the order. Calculate which is the sum of ranks of Samples 1 in Table 1. Look up the table to get the tail values in the rank sum test. When the sum satisfies the test criterion (7), there is pollution; otherwise, there is no pollution.
At time , if the network does not detect the pollution, the pollution detection of the network will be made at .

Algorithm 2 (the pollution detection of the node ).
Preconditions. The number of samples is large enough and known. The samples at the first sampling time are known. The samples of the detection node are known. The parameters and which are related to the test power are given.

Step 1. Use Shapiro-Wilk test to test the distribution of the sample noise according to (2) at the first sampling time .

Step 2. If the distribution is normal, get the value range of the significance level according to (13), choose an any significance level in the range, calculate the test statistic as (9), and go to Step 3. If the distribution is nonnormal, go to Step 4.

Step 3. When the test statistic satisfies the test criterion (10), the node detects the pollution; otherwise, the node fails to detect the pollution.

Step 4. List the data in ascending order and allocate the ranks according to the order. Calculate which is the sum of ranks of Samples 3 in Table 2. Look up the table to get the tail values in the rank sum test. When the sum satisfies the test criterion (7), the node detects the pollution; otherwise, the node fails to detect the pollution.

5. Implementation Examples

Experiment 1. A simulation is carried out to test the proposed detection algorithms. The distribution of monitoring noises is normal.
Background. The size of the static shallow water is 10 m × 10 m, and the average depth of the water is m. Apart from the coast of impervious , there is a continuous source at  (m). Starting from time h, the solution with pollutant is injected into the water. The mass of the pollutant is kg. The diffusion coefficient is m2/h. The nodes in the network sample the concentration uniformly with interval h. The diffusion can be depicted by the model:where is the current time. The locations of the monitoring nodes are (2.05, 5.55), (1.05, 4.55), (0.05, 5.55), (1.05, 6.55), (0.55, 6.05), (1.55, 6.05), (0.55, 5.05), (1.55, 5.05), (0.05, 6.25), and (1.05, 7.05). The monitoring values in the experiment are the simulation values of (14) adding the noise with a normal distribution . The given parameters are and , and there are 10 nodes, .

(A) The Pollution Detection of the Network. According to constraint (13) and the sample size table of test in [18], it is can be deduced that under the given parameters , , and in Table 3.

For different values and significance levels, detect the pollution at the initial observation time 0.01 h, and the results are shown in Table 3. The hypothesis testing detection method under the normal distribution is used, and in the table, “” represents that in the 100 experiments the pollution is detected successfully times.

The results show that the pollution can be detected by the network soon.

(B) The Pollution Detection of the Node. Detect whether the node has detected the pollution source based on the observed data of the node (1.05, 7.05). The monitoring data is as shown in Table 4. Compare with the simple detection method in which the criterion of whether the pollution source has been detected is that the monitoring value is larger than a given threshold, and the results are shown in Table 5.

In Table 5, “—” represents no result. Comparing the results in the table, it can be seen that the detection method using hypothesis testing is more stable if an appropriate significance level is chosen, and in the simple detection, to detect the pollution source timely the threshold should be as small as possible. But apparently, if the noise in the practical applications is considered, small thresholds may bring about large false alarm rates.

Experiment 2. A practical experiment is carried out to test the proposed detection algorithms.
Background. In water, of which the size is 200 cm × 200 cm, the average depth cm. There is a continuous source at the boundary. Starting from s, the solution of MgSO4 is discharged to the water continually. The nodes deployment is depicted by Figure 2. The monitoring values of different sensor nodes in the experiment are shown in Table 6.
Distribution Verification. Both at time 5 s and 10 s, for different significance levels , , and , the detection results all show that the distribution of monitoring data is not normal. So, the detection methods based on the Wilcoxon rank sum tests are used.

(A) The Pollution Detection of the Network. The given significance level is , and the network detects the pollution at 30 s.

(B) The Pollution Detection of the Node. The significance level is , and the time when nodes 0, 1, 2, 3, 4, 5, 6, and 7 find the pollution is shown in Table 7. The detection method is as (11).

The results show that the pollution can be detected by the nodes only when there are some increasing concentration samples.

From the results of the experiments above, it can been seen that, in the simple detection method, an appropriate decision threshold is hard to be given, so the pollution source detection by using hypothesis testing is more preferable. Whether the distribution of the sample noise is normal or not, the corresponding detection algorithms are available.

6. Conclusions

Water pollution detection is important in the water environment monitoring. The pollution source detection problems of the network and of the node are discussed based on hypothesis testing. The sample size requirements in different detection problems are also analyzed. In implementation examples, the proposed pollution detection algorithms are tested. The effectiveness of the detection algorithms is proved. This work mainly focuses on theoretical detection approaches based on hypothesis testing. In the future work, more problems in the practical applications will be studied when the proposed detection algorithms are adopted, such as the optimized detection methods of the node related to large or small concentration variations, and the influences of the concentration variations on the statistical distribution in the distribution test step.

Disclosure

The founding sponsors had no role in the design of the study; the collection, analyses, or interpretation of data; the writing of the manuscript; and the decision to publish the results.

Competing Interests

The authors declare that there is no conflict of interests regarding the publication of this article.

Acknowledgments

The research is sponsored by National Natural Science Foundation (NNSF) of China under Grant nos. 61463053, 61501337, and 61471275 and Natural Science Foundation of Guizhou Province in China under Grant LH2015]7549. The authors would like to thank Dr. Li Chai in Wuhan University of Science and Technology for the helpful suggestions.