Abstract

Ecological balance is one of the most attractive topics in biological, environmental, earth sciences, and so on. However, due to the complexity of ecosystems, it is not easy to find a perfect way to conclusively explain all the potential impacts. In this paper, by considering several important elements, we seek to build a dynamic network model to predict the Earth’s health, trying to identify and explain how the human behavior and policies affect the model results. We firstly empirically analyze both the topological properties and time-dependent features of nodes and propose an Earth’s health index based on Shannon Entropy. Secondly, we identify the importance of each element by a machine learning approach. Thirdly, we use a spreading model to predict the Earth’s health. Finally, we integrate the topological property and the proposed health index to identify the influential nodes in the observed ecological network. Experimental results show that the oceans are the key nodes in affecting the Earth’s health, and Big countries are also important nodes in influencing the Earth’s health. In addition, the results suggest a possible solution that returning more living lands might be an effective way to solve the dilemma of ecological balance.

1. Introduction

Ecological balance is one of the most attractive topics in biological, environmental, Earth sciences and many other related disciplines [1], especially since the industrialization has been undergoing for about two hundred years. To better understand how biosphere responds to the increasing pressure (e.g., population explosion, water and air pollution, climate change), there is a vast class of researches devoted to discovering possible solutions in alleviating pains of the Earth. However, due to the complexity of ecosystems, it is not easy to find a perfect way to conclusively explain all the potential impacts [2] that is responsible for the ecological fragility [3, 4]. Among various studies, Ecological Network Analysis (ENA) [5, 6] is regarded as one promising methodology to assess the Earth’s health [7].

Ecological networks can be extracted from various information, resulting in different kinds of networks, where each node represents a nation, a continent, an ocean, a habitat, or a park, and an edge is present when two nodes are directed or mutually influenced, varying from economic impact, population flow to environmental pollution, economic level, and so forth [5, 8].

Since the network information is not explicitly provided, we start our research by constructing the global ecological network via extracting nodes and links from the world geography map (see Figure 1). We consider each nation or ocean as one node, and a link is present if the two countries/oceans (or country and ocean) are geographically neighboring. For example, the two nodes, China and Russia, are connected since they are neighboring countries. In addition, there is also a link between the Pacific Ocean and China since they are also mutually connected. Figure 2 shows the constructed network. Furthermore, we also consider five time-dependent factors of each node, including the population size, economic level, habitats area size, energy consumption, and air pollution, which might affect the global ecological environment.

After constructing the time-dependent ecological network, we start our research as follows.(i)Firstly, we observe network topological properties by formulating both local features of a node and global topology of the network.(ii)Secondly, we investigate how the dynamical factors evolve and how they affect the Earth’s health.(iii)Thirdly, we use a machine learning algorithm to identify the influential factors of the ecological network.(iv)Fourthly, we design a spreading model to predict the Earth’s health and perform sensitive analysis to test its robustness.(v)Finally, we use the -core deposition method [9] to identify the influential nodes by considering factors as weights.

Many researches have adopted operation researches (OR) methods, such as minimum spanning tree (MST), to discover mathematical solutions (e.g., the minimum cost/maximum flow). Hill proposed a matrix solution to solve the number of paths with certain length from any two nodes [10]. Finn presented three important indices to evaluate the ecosystem flows, including the Total System Through-Flow (TST), the Average Path Length of an Inflow (APL), and the Cycling Index (CI), which are widely adopted in discussing the mass and energy flow mechanism [11]. However, researches from Network Science (NS) [1219] consider the ecosystem problem from a very novel perspective. It not only abstracts nodes and affiliated properties, but also takes into account various kinds of interactions and functions among them. Therefore, NS-based methods, including link prediction and navigation, have been introduced, trying to discover the potential global topology via analyzing the local structure and dynamics [20, 21].

2. Methods

In this paper, we adopt the Network Science to analyze the ecological networks mainly because of its robustness and explainable solution in simulating both the static and dynamical properties of graphs. In this section, we shall describe the construction of our network model, including nodes and edges information, property formulation, and the dynamical evolution as well. In particular, we begin our study based on several necessary assumptions.(i)All the data and information about the ecological network are collected from reliable statistical database;(ii)Both the properties and topological position of a node in the network are important to evaluate its influence for the Earth’s health;(iii)The spatial network has similar structure and features with real ecological network.

2.1. Static Properties

In the constructed network, a graph is used to describe its structure, where is the node set and is the edge set (Table 1 shows the basic information of the observed ecological network). Here, we consider four respective indices to analyze the static properties of the ecological network.

(i) Degree Index. Degree index [22] indicates how many nodes that a nation/ocean connects to. Naturally, the degree index of node is defined as where if there is a link between node and ; otherwise .

(ii) Betweenness Index. Betweenness [22] is defined as how many shortest paths pass through the target node. The larger the Betweenness is, the more connective role the target node plays. In a given network, the Betweenness is denoted as where if there is a shortest path between node and passing node .

(iii) Closeness Index. Closeness [22] is defined as the reverse distance between the target node and other nodes. The larger the Closeness is, the closer they would be and vice versa. In a given network, the Closeness is denoted as where denotes the length of the shortest path between nodes and .

(iv) -Core Index. -core [23], denoted as , is the core number of a node which is the largest value of a -core containing that node. It is obtained as follows: (i) remove from the graph all nodes of degree less than and (ii) then remove these vertices repeatedly until no further removal is possible. The remaining result, if exists, is the -core. Thus, a network is organized as a set of successively enclosed -cores.

2.2. Dynamical Factors

Besides the static network properties, we collect various data from World Bank (http://data.worldbank.org/) to investigate the dynamical factors of all the nodes in the ecological network. The dataset includes population size, per capita GDP, area of land and marine, energy consumption per unit of GDP, and carbon dioxide emission of each country from year 1962 to 2011. Specifically, we observe the five following factors for each node.(i)Population size, denoted as , is the total population of node in the year . (ii)Economic level, denoted as , is the per capita GDP of node in the year . (iii)Habitats area, denoted as , is the total area of land and marine of node in the year . (iv)Energy consumption, denoted as , is energy consumption per unit of GDP of node in the year . (v)Air pollution, denoted as , is the total amount of carbon dioxide emission of node in the year .

Figure 3 shows how the five factors change from the year 1961 (HA starts from 1990 and EC starts from 1980 due to the data absence) to 2011. It can be seen that, generally, the values of all the factors increase year by year. It also shows that the population size has highly positive relationship with economic level. For example, the population size of China (CHN) and USA are both in the top five nations, and their average GDP also have high ranks among all the 126 nations. Meanwhile, their air pollution is ranked in the worst five nations, which might suggest that the development of economy would have negative impact on the environment. Furthermore, we list the top 20 nodes for both static properties and dynamical factors in Table 2. It can be seen that the Atlantic Ocean (ATO) holds the most significant role in maintaining the robustness of the ecological network because it connects the largest number of nodes (Table 3). Russia (RUS) has high network property, with degree rank number , Betweenness rank number , and Closeness rank number , but simultaneously has a relatively bad air quality (rank number 3). Other nations, such as China (CHN), have the similar situation. Comparatively, USA is not ranked in the top network structure list but has a large population (rank number 3) and a high economic  level (rank number 6), which might promote its impact in affecting the global Earth’s health.

2.3. Definition of the Earth’s Health Index

Inspired by previous analyses, we consider that the Earth’s health is not just affected by a single factor, but a joint influence resulting from many complicated factors. In this paper, we use the Shannon Entropy [2426] to integrate the impact of all possible factors to characterize the Earth’s health, denoted by , as where is the normalized fraction of factor for node , denotes the set of all the factors defined in Section 2.2, and the final Earth’s health value, EH, runs over the sum of all nodes. According to the original definition of Shannon Entropy, the larger the entropy value is, the more equal the distribution will be. Therefore, a large value of EH suggests a good situation of ecological balance both among nations and factors and hence indicates good Earth’s health and vice versa (Table 4).

2.4. Identifying the Influential Factor via Machine Learning Approach
2.4.1. Random Forest

We use Random Forest [27] to evaluate the importance of factors. Random Forest is an ensemble regressor/classifier that consists of many decision trees and then outputs the value that is the mode of the values/classes output by individual trees. In this scenario, we apply this method in a regression way and use it to evaluate each feature’s importance. Compared with other regression models, we choose the Random Forest model because of its following advantages for our solution:(i)it can tackle high-order variable interactions or correlated predictor variables;(ii)it can be used not only for prediction, but also to assess variable importance;(iii)it can partially overcome the overfitting problem.

2.4.2. Base Learner: Classification and Regression Tree (CART)

Given a training vector , and a label vector , the decision tree recursively partitions the space such that the samples with the same labels can be classified together. Let the data at node be represented by , for each candidate split consisting of a feature and threshold ; partition data into two parts: The impurity at is computed using an impurity function , the choice of which depends on the task being solved: Then, select the parameters that minimize the impurity: After this recurse for subsets and until the maximum allowable depth is reached, . In regression problem, for a node , representing a region with observations, we choose the criterion of Mean Squared Error (MSE) as impurity function :

2.4.3. Construction of Random Forest

Let the number of training cases be and let the number of variables in the regressor be . We are told the number of input variables to be used to determine the decision at a node of the tree; should be much less than .(i)Choose a training set for this tree by choosing times with replacement from all available training cases (i.e., take a bootstrap sample). Use the rest of the cases to estimate the error of the tree.(ii)For each node of the tree, randomly choose variables on which to base the decision at that node. Calculate the best split based on these variables in the training set.(iii)Each tree is fully grown and not pruned (as may be done in constructing a normal tree classifier). For prediction a new sample is pushed down the tree. It is assigned the value of the training sample in the terminal node it ends up in.

2.4.4. Out-of-Bag (OOB) Evaluation

To evaluate the Random Forest model, we use 2/3 of the data as training set and remaining 1/3 (saying Out-of-Bag) are regarded as the test set when constructing the base learner. We calculate the average result from 50-round simulations to alleviate the random fluctuations. For each tree , the Out-of-Bag (OOB) simulation is tested in detail as follows.(i)Consider the associated sample.(ii)Denote by err the error of a single tree t on this sample.(iii)Randomly permute the values of in to get a permuted sample denoted by , the error of predictor on the perturbed sample: where represents th variable of each data. Then we can get for each and normalize it. The bigger the is, the more important the variable is. Figure 4 shows the result for all the five features. It can be seen that, among all the features, the value of habitats area size is the biggest, hence the most important factor for the Earth’s health, which is in the agreement with reports by public media that people, especially humans living cities, now occupy smaller and smaller space than before, resulting in comparatively a much worse living condition.

2.5. Predicting the Earth’s Health in Ecological Networks

We use a dynamic spreading model [28] to predict the Earth’s health by considering the observed ecological network structure. The model runs as follows.(i)At the initial step, each node is set a health value by averaging the Earth’s health index ( defined by (4)) from the most recent five years.(ii)We then choose 10% of the nodes as “seed” nodes and add to each seed node. The is calculated by averaging over the incremental values of the most recent five years.(iii)At each time step , each node will affect all its neighbouring nodes’ Earth’s health index by where is the Earth’s health influence from node to and and are tunable parameters.(iv)Then, the node ’s Earth’s health index at time step is summed over all ’s neighbours as where if there is a link between node and node , and otherwise.(v)Finally, the global Earth’s health, at time step , is summarized over all the nodes:

3. Results

3.1. Performance Comparison

To test the performance of our spreading model, we set and to predict the global Earth’s health in the computer simulation. In addition, we also use Gaussian Fitting to compare with our model. Figure 5 shows the comparison results. It can be seen that our model can better fit the real data, comparing with Gaussian Fitting. In addition, the proposed EH index shows that the Earth’s health is getting worse from 2008, which gives us the warning that we should put much more attention to our environment. Correspondingly, results from Section 2.4.4 suggest for us a possible solution that returning more living lands might be the most effective way to solve this dilemma.

3.2. Sensitivity Analysis

We then perform the sensitivity analysis to test the robustness of our model. We randomly delete fraction of the links and see whether our model is reliable or not. Figure 6 reports different prediction results of model for various values of . It can be seen that the prediction result of model is quite robust that even a large fraction of links, 60% for instance, is removed. Therefore, it can be concluded that our model is reliable for predicting the Earth’s health.

3.3. Identifying the Influential Node in Ecological Networks

Our Earth’s health index tries to diagnose and predict the global Earth’s health status. In addition, in order to find which node (saying nation or ocean) plays the most important role in affecting the Earth’s health, we additionally perform analysis to rank the node importance. We integrate the -core value, (see Section 2.1), and the Earth’s health index to evaluate the node ’s importance, , which is consequently defined as

Figure 7 illustrates the node importance in affecting the Earth’ health by versus the corresponding rank, where some typical nodes are marked. The oceans (with the highest rank) indeed are the key nodes in affecting the Earth’s health; USA, Russia, and China are also important nations in influencing the Earth’s health. Some small nations, such as Madagascar (MDG) and Iraq (IRQ), play less important roles for the Earth’s health.

4. Conclusion and Discussion

In this paper, we collect various data and construct a 145-nation (including 126 nations, 19 oceans/seas, and 403 edges) world ecological network, with each node representing a nation or an ocean and each edge representing geographical neighboring relationship of the corresponding two nodes. Firstly, we analyze both the topological properties and time-dependant features of nodes. Secondly, we propose an Earth’s health index based on Shannon Entropy. Thirdly, we identify the importance of elements by a machine learning approach (Random Forest). Fourthly, we design a spreading model to predict the Earth’s health and perform sensitive analysis to test its robustness. Finally, we integrate the topological property (-core index) and the health index to identify the influential nations in the observed ecological network.

The model results indicated that the oceans (with the highest rank) indeed are the key nodes in affecting the Earth’s health. The Big countries, for example, USA, Russia, and China, are also important nations in influencing the Earth’s health. Correspondingly, it suggests for us a possible solution that returning more living lands might be the most effective way to solve this dilemma. The combination of topological properties and local factors leads to good performance in both predicting the good and bad trends of the Earth’s health. The model can be easily extended by considering more factors. However, our model needs empirical support from more sufficient data. Also, the incremental mechanism may hinder long-term prediction.

Acknowledgments

This work was partially supported by the National Natural Science Foundation of China (Grant nos. 11105024, 1147015, 11305043, and 11205040), the EU FP7 Grant 611272 (project GROWTHCOM), and the start-up foundation and Pandeng project of Hangzhou Normal University.