Abstract and Applied Analysis

Abstract and Applied Analysis / 2021 / Article

Research Article | Open Access

Volume 2021 |Article ID 6691678 | https://doi.org/10.1155/2021/6691678

Ghizlane Kouaiba, Driss Mentagui, "Resolution of the Min-Max Optimization Problem Applied in the Agricultural Sector with the Estimation of Yields by Nonparametric Statistical Approaches", Abstract and Applied Analysis, vol. 2021, Article ID 6691678, 18 pages, 2021. https://doi.org/10.1155/2021/6691678

Resolution of the Min-Max Optimization Problem Applied in the Agricultural Sector with the Estimation of Yields by Nonparametric Statistical Approaches

Academic Editor: Paul Eloe
Received02 Oct 2020
Accepted03 Mar 2021
Published15 Apr 2021


The ultimate objective of the problem under study is to apply the min-max tool, thus making it possible to optimize the default risks linked to several areas: the agricultural sector, for example, which requires the optimization of the default risk using the following elements: silage crops, annual consumption requirements, and crops produced for a given year. To minimize the default risk in the future, we start, in the first step, by forecasting the total budget of agriculture investment for the next 20 years, then distribute this budget efficiently between the irrigation and construction of silos. To do this, Bangladesh was chosen as an empirical case study given the availability of its data on the FAO website; it is considered a large agricultural country in South Asia. In this article, we give a detailed and original in-depth study of the agricultural planning model through a calculating algorithm suggested to be coded on the R software thereafter. Our approach is based on an original statistical modeling using nonparametric statistics and considering an example of a simulation involving agricultural data from the country of Bangladesh. We also consider a new pollution model, which leads to a vector optimization problem. Graphs illustrate our quantitative analysis.

1. Introduction

The study is based on the quantification of the risk of having the need exceeding the production and the quantity of the production already ensiled; the same quantification will be applied in the case where the production of a given year exceeds both the need as well as the capacity of the silos for the same year; the idea is to calculate these risks of faults over 20 years in the future, based on the total investment amounts allocated for irrigation and the construction of the silos planned over these 20 years, via 3 calculation methods simulated in iterations ( distributions of the amounts of irrigation and construction of the silos), and optimize them via a vector Pareto optimization algorithm for faults calculated by considering the pollution and those calculated without the assumption of the pollution; the strategies (belonging to the Pareto front) said optimal strategies will be linked to their amount of investment in irrigation and construction of silos affected and based on the total of these planned investment amounts. Therefore, it is up to the decision-maker to choose a strategy for allocating these investment amounts, among all optimal strategies retained (strategies belong to the Pareto front).

2. The Problem Schematization

2.1. The General Idea of the Work:

Minimize the difference between the production , the requirement , and the quantity to be removed or ensiled after having maximized the risk through 3 scenarios that will be well explained after.

The variables with as the total cultivated area in year , the surface area of irrigated land with yield in year , the surface of nonirrigated land with yields in year , the total production in year , the yield of irrigated land, and the yield of nonirrigated land.

Knowing the requirement of year , the difference between the production of year and the requirements can be explained as follows:

However, the quantities to ensile or to consume are not often equal, respectively, to the capacity of the silo or to the harvest stock for a given year, which requires an optimization of resources. To do this, we need to calculate the default , as follows: with Such as (i) the silo capacity in year (ii): the quantity ensiled in the silo in :(iii) the budget allocated to the construction of the silo for year (iv) the cost price of a unit of silo capacity:(v) the budget allocated to irrigation for year (vi)the expenditure per unit of irrigated land(vii): the product requirements in year

Therefore, the total budget is given by

3. Fundamental Elements for the Calculation

(1)Initial conditions: (2001), (2000), (2001), (2001), (2001) (2001), (2000), (2000), and (2001): to fix by assumptions(2)Total agricultural areas, areas of irrigated and non-irrigated land

The data was collected via the FAO site, selected country: Bangladesh (total agricultural million ha and irrigated land areas) [1, 2].

The surface area of nonirrigated lands has been deducted as shown in Table 1. (3)Irrigation and silo construction budgets: based on FAO data (if we consider that the unit costs of irrigation and silo construction are, respectively [3] and [4] (for the case of tank stores, source FAO)), and based on the total production of the crops and the irrigated land area, we will have Table 2

YearIrrigated lands (1000 ha)Nonirrigated lands (1000 ha)


YearIrrigation expenses (million USD)Cost for the construction of silos (million USD)Simulated requirements (million tonnes)


It is also assumed that 75% of the total production is devoted to cereals; therefore, the total production can be approximated to the production of cereals. (4)Irrigated and nonirrigated lands yields: based on FAO publications [5], the yields of irrigated land are recorded between 5 tonnes/ha and 13 tonnes/ha from an irrigation of 4500 m3/ha and from 0 tonne/ha to 5 tonnes/ha for nonirrigated lands; therefore, we go through a numerical simulation of the respective yield values of irrigated and nonirrigated lands to deduce the simulated productions, while assuming these yields, do not take into account the factor of pollution (the yields of the supposedly polluted land is to be deduced later) (see Table 3)

YearIrrigated lands yield (tonnes/ha)Non-irrigated lands yield (tonnes/ha)


In order to determine the distribution function of the returns and q in order to deduce the expectations, nonparametric methods will be the subject of the next part of this study.

4. Nonparametric Estimation of Yield Density: Theoretical Study

4.1. The Histogram Method

The basic idea is to segment the observations belonging to the interval into intervals with length .

In addition, for / and .

is the number of the observed values in the interval , and is a continued value that belongs to the same interval that we are looking for its probability formula.

The density function [6] associated to is in the following form:

Such that .

Therefore, the problem comes down to estimating the probability vector:

While the question that arises is how to choose , this comes down to choosing the number of intervals to have a well-smoothed distribution (do not fall in the case of a histogram called “oversmoothing” when is larger or the opposite case of a histogram called “undersmoothing” when tends towards zero). (1)The quadratic risk of

The solution proposed in this sense is to establish a risk function and minimize it as a function of the window ; the function adequate to this problem is that expressing the expectation of the quadratic deviation between and : such that

However, and ; this says that : the law of succeeding an experiment with a probability , repeating times.

Therefore, we will have which implies and that

Finally, the expectation of the quadratic deviation of is written as follows: (2)The quadratic risk of integrated

For the function to be as a function of , the just mentioned, it will be convenient to integrate the function over the interval :

It seems that the speed of convergence of depends on, which leads us to reformulate the and going through the limited expansion of :

Passing through the limited development of the first Taylor degree at point, we will have


Proof of Equation 19. We consider that .
Moreover, based on the definition of a negligible function [7] on a point that says the following: let a function and not vanish at \{a}, is said to be negligible in the neighborhood of {a} in front of if and only if Since is negligible in front of , then and because .
Therefore, However, we have Therefore, Finally, which implies After the calculation is done, we will have Finally, we will have For its part, the variance is given by If we approach the quantity to 1, the variance can be expressed in the following form: Finally, However, the quantification of this risk function, in other words, the determination of optimal , must be carried out over the entire interval ; therefore, the integral is applied only to the nonnegligible part of The optimal corresponds to finding This leads us to look for the argument of the null derivative of. Therefore, However, we do not have the distribution of , to deduce .
To do this, it will be necessary to go through the “cross-validation” estimator. (3)The “cross-validation” formula of the estimationIt seems that, according to the formulation of IntegratedRisk, the calculation of turns out to be difficult in practice; hence, the idea is to calculate de ; this amounts to calculating the formula “cross-validation” (see Figure 1): If we estimate , we will have Finally, we have

4.2. The Kernel Method

The kernel method is applied if the continuity of the distribution function is ensured (the distribution function belongs to class functions). Let us consider in a first step the following estimate: and the are .

Let .

The strong law of large numbers indicates that almost certainly/calculation:

And finally, almost certainly.

The derivative of the function is given by

This implies

Such an estimator is known as the Rosenblatt estimator.

Generally, we put [8]

is called the kernel of the estimate; generally, this kernel can be illustrated following different paces [9]: (i)The triangular kernel: (ii)The parabolic kernel: (iii)The Gaussian kernel:(1)The quadratic risk of

The mean square deviation as mentioned in the histogram approach is defined as follows:

Definition 1. Let be an interval in and the pair , the class Hölder defined on is the set of differentiable functions which satisfy

Definition 2. Let be an integer; we would say that is a kernel function of order [10] if the functions , are integrable and which satisfy

In the following, we will focus on the kernel function of order 2.

Definition 3 (Cauchy-Schwarz inequality). If and are two integral functions on , then the absolute value of the integral of their product satisfies the following inequality: Let us calculate the upper bound of the bias [11, 12]:

Let .

Going through a Taylor expansion of order 2 of , we have with .

However, we previously assumed that the kernel function is of order 2, and therefore,

The hypothesis of implies


Following Cauchy-Schwartz’s theorem, we will have the following inequality:

Knowing that the calculation of the expectation is done on the interval if and vice versa.


Likewise, the variance of is calculated as follows:

The maximum of a probability :

Put .


Computing amounts to minimizing the upper bound of . (2)The quadratic risk of integrated

Computing from amounts to deducing it from the following integral: because (3)The “cross-validation” formula of the estimation

As mentioned previously, the cross-validation formula “” of the estimation of distributions is defined by which implies that

Let us say and show that is an unbiased estimator of .

This amounts to demonstrating that

Under the assumption that are , therefore we have


5. Theoretical Calculation of Strategies

Recall that the default risk , for a given year , “as discussed by Moiseev [13],” is calculated as follows:

The nonparametric estimate of yield expectations will allow the calculation of strategies for the distribution of budgets between irrigation and the construction of silos over the next N years; these strategies are given by

The project strategy can be calculated in another way:

Strictly positive or strictly negative defaults do not have the same meaning. This leads us to calculate the budget allocation strategy using the following formula:

The problem of optimizing the min-max comes down to choosing the optimal strategy for the allocation of budgets that minimizes the risk of default; this amounts to finding the following scalar minimum:

In the case where the pollution factor is considered, the new default risk values will be obtained from the formula below:

, and are, respectively, the yield of irrigated land, the yield of nonirrigated land, and the quantities to ensile or to consume by integrating the factor of pollution in the modeling of the risk of default, then go through a vector optimization between the strategies as a function of and that calculated as a function of ; this leads us to calculate:

Then, minimize the following vector:

6. Multiobjective Pareto Optimization (Pareto Front)

A multiobjective optimization problem presents itself, in general, as follows [14, 15]:

6.1. Dominance Principle and Pareto Optimum

For a given order, relation in a space of dimension and let and let be two vectors in the decision space and , . We will say that dominates [16] , and we write , if and only if [17]

A solution of the multiobjective problem is said to be Pareto optimal [18] if there is no point of the decision space which dominates such that for each and which verifies that . All of these nondominated points in the Pareto sense constitute the Pareto front [19]. In our example, the vector of irrigation prices over the next 20 years will be the vector of decisions and the vector of objectives is the vector .

Notice that “yes” means the corresponding identity holds and “no” means the identity does not hold.

Once the strategies are simulated, we will have a data frame of two columns and of length of () (see Figure 2). Therefore and by applying the principle of dominance, the efficient strategies (Pareto Front strategies) are calculated by the algorithm of the Figure 3.

7. Empirical Part

Following a Shapiro-Wilk test [20] of the Gaussian law, the p value of the test (>5%) indicates that the yields of irrigated and nonirrigated land follow a Gaussian law (Figure 4).

Under the assumption of the normal distribution of these returns, we choose to calculate their expectations through a nonparametric estimate of the density by the kernel method by the choice of a Gaussian kernel, through the following program in Figures 5 and 6.

Since the hypothesis of the normality of yields of nonirrigated land and those of irrigated land is verified, the median and the mean must be equal, given the Gaussian law is a symmetrical law.

Through the “summary” command, we notice that the arithmetic mean is not significant for estimating the expectation because it is different from the median.

By using the kernel estimate, the average obtained corresponds perfectly to the median; this means that the estimate of the density by this method turns out to be robust for the calculation of the expected returns ( tonnes/ha for nonirrigated land and tonne/ha for irrigated land).

We assume that the pollution factor can reduce the yields of irrigated and nonirrigated land by 12% and 20%, respectively. The average of the returns by the kernel method makes it possible to deduce that tonnes/ha and tonnes/ha.

The yields and were simulated by the numerical “Box-Müller” method taking into account the intervals of the values of and according to the FAO site (from 0 to 5 tonnes/ha for nonirrigated land and more than 5 tonnes/ha for irrigated land).

The nonsimulated data series cover the period (area of irrigated land) 2001-2016 (source: FAO site); thereafter, the areas of irrigated land will be forecast over the period 2017-2036.

Total agricultural million ha (to deduct the surface areas of nonirrigated land).

The total budget to be invested will be planned based on the ARIMA time series () [21], then will be distributed between irrigation and construction of silos; the total budget to be invested will be forecast [22] based on the ARIMA time series (), then will be distributed between irrigation and construction of silos.

If a process follows an ARIMA () model, then it is represented by the following relation: such that (i) is white noise(ii) is order of differentiation of the process (iii) is the delay operator(iv)(v)

If follows an ARIMA () process, so follows an ARIMA () process.

In practice, the series of budgets was not stationary following the increased Dickey-Fuller test ( value > 5%) and therefore, the alternative hypothesis of stationarity is rejected.

After two differentiations, the series of investment budgets became stationary ( value < 5%) as shown in the code in Figure 7.

To determine the degrees and of the series, we use the partial autocorrelation function PACF and the autocorrelation function ACF, respectively (Figures 8 and 9, respectively).

From the two graphs, we can first assume that and (after these values, the PACF and AFC functions practically cancel each other out).

To ensure the choice of the appropriate model, we proceed to build the AIC matrix (Akaike Information Criterion) of the ARIMA models (, 2, ) with ranging from 1 to 3 and ranging from 1 to 3 and retain the model which corresponds to the minimum AIC index (Figure 10).

The AIC matrix confirms that the model to be retained is ARIMA (2, 2, 1).

The forecast of this series under the software over the period 2017-2036 is represented by the graph in Figure 11.

The construction of the budgets allocated to irrigation and the construction of silos over the period 2001-2016 is based, respectively, on the unit prices and ; then, their forecasts (2017-2036) will be a distribution simulation of the series of investment budgets between irrigation and the construction of silos by digital simulation methods (we go through 4 distribution simulations of the total investment budget).

The product quantity requirements for the year over the period 2017-2036 will be simulated; after the calculation of the “delta” faults under the software, we move on to calculate the () with over the period 2017-2036.

An example of calculation of and for the ; the number of simulation of () with is (