Resolution of the Min-Max Optimization Problem Applied in the Agricultural Sector with the Estimation of Yields by Nonparametric Statistical Approaches

Kouaiba, Ghizlane; Mentagui, Driss

doi:https://doi.org/10.1155/2021/6691678

Abstract and Applied Analysis

On this page

Abstract Introduction Conclusion Data Availability Conflicts of Interest References Copyright Related Articles

Research Article | Open Access

Volume 2021 | Article ID 6691678 | https://doi.org/10.1155/2021/6691678

Resolution of the Min-Max Optimization Problem Applied in the Agricultural Sector with the Estimation of Yields by Nonparametric Statistical Approaches

Ghizlane Kouaiba¹and Driss Mentagui¹

Academic Editor: Paul Eloe

Received02 Oct 2020

Accepted03 Mar 2021

Published15 Apr 2021

Abstract

The ultimate objective of the problem under study is to apply the min-max tool, thus making it possible to optimize the default risks linked to several areas: the agricultural sector, for example, which requires the optimization of the default risk using the following elements: silage crops, annual consumption requirements, and crops produced for a given year. To minimize the default risk in the future, we start, in the first step, by forecasting the total budget of agriculture investment for the next 20 years, then distribute this budget efficiently between the irrigation and construction of silos. To do this, Bangladesh was chosen as an empirical case study given the availability of its data on the FAO website; it is considered a large agricultural country in South Asia. In this article, we give a detailed and original in-depth study of the agricultural planning model through a calculating algorithm suggested to be coded on the R software thereafter. Our approach is based on an original statistical modeling using nonparametric statistics and considering an example of a simulation involving agricultural data from the country of Bangladesh. We also consider a new pollution model, which leads to a vector optimization problem. Graphs illustrate our quantitative analysis.

1. Introduction

The study is based on the quantification of the risk of having the need exceeding the production and the quantity of the production already ensiled; the same quantification will be applied in the case where the production of a given year exceeds both the need as well as the capacity of the silos for the same year; the idea is to calculate these risks of faults over 20 years in the future, based on the total investment amounts allocated for irrigation and the construction of the silos planned over these 20 years, via 3 calculation methods simulated in iterations ( distributions of the amounts of irrigation and construction of the silos), and optimize them via a vector Pareto optimization algorithm for faults calculated by considering the pollution and those calculated without the assumption of the pollution; the strategies (belonging to the Pareto front) said optimal strategies will be linked to their amount of investment in irrigation and construction of silos affected and based on the total of these planned investment amounts. Therefore, it is up to the decision-maker to choose a strategy for allocating these investment amounts, among all optimal strategies retained (strategies belong to the Pareto front).

2. The Problem Schematization

2.1. The General Idea of the Work:

Minimize the difference between the production , the requirement , and the quantity to be removed or ensiled after having maximized the risk through 3 scenarios that will be well explained after.

The variables with as the total cultivated area in year , the surface area of irrigated land with yield in year , the surface of nonirrigated land with yields in year , the total production in year , the yield of irrigated land, and the yield of nonirrigated land.

Knowing the requirement of year , the difference between the production of year and the requirements can be explained as follows:

However, the quantities to ensile or to consume are not often equal, respectively, to the capacity of the silo or to the harvest stock for a given year, which requires an optimization of resources. To do this, we need to calculate the default , as follows: with Such as (i) the silo capacity in year (ii): the quantity ensiled in the silo in :(iii) the budget allocated to the construction of the silo for year (iv) the cost price of a unit of silo capacity:(v) the budget allocated to irrigation for year (vi)the expenditure per unit of irrigated land(vii): the product requirements in year

Therefore, the total budget is given by

3. Fundamental Elements for the Calculation

(1)Initial conditions: (2001), (2000), (2001), (2001), (2001) (2001), (2000), (2000), and (2001): to fix by assumptions(2)Total agricultural areas, areas of irrigated and non-irrigated land

The data was collected via the FAO site, selected country: Bangladesh (total agricultural million ha and irrigated land areas) [1, 2].

The surface area of nonirrigated lands has been deducted as shown in Table 1. (3)Irrigation and silo construction budgets: based on FAO data (if we consider that the unit costs of irrigation and silo construction are, respectively [3] and [4] (for the case of tank stores, source FAO)), and based on the total production of the crops and the irrigated land area, we will have Table 2

It is also assumed that 75% of the total production is devoted to cereals; therefore, the total production can be approximated to the production of cereals. (4)Irrigated and nonirrigated lands yields: based on FAO publications [5], the yields of irrigated land are recorded between 5 tonnes/ha and 13 tonnes/ha from an irrigation of 4500 m³/ha and from 0 tonne/ha to 5 tonnes/ha for nonirrigated lands; therefore, we go through a numerical simulation of the respective yield values of irrigated and nonirrigated lands to deduce the simulated productions, while assuming these yields, do not take into account the factor of pollution (the yields of the supposedly polluted land is to be deduced later) (see Table 3)

In order to determine the distribution function of the returns and q in order to deduce the expectations, nonparametric methods will be the subject of the next part of this study.

4. Nonparametric Estimation of Yield Density: Theoretical Study

4.1. The Histogram Method

The basic idea is to segment the observations belonging to the interval into intervals with length .

In addition, for / and .

is the number of the observed values in the interval , and is a continued value that belongs to the same interval that we are looking for its probability formula.

The density function [6] associated to is in the following form:

Such that .

Therefore, the problem comes down to estimating the probability vector:

While the question that arises is how to choose , this comes down to choosing the number of intervals to have a well-smoothed distribution (do not fall in the case of a histogram called “oversmoothing” when is larger or the opposite case of a histogram called “undersmoothing” when tends towards zero). (1)The quadratic risk of

The solution proposed in this sense is to establish a risk function and minimize it as a function of the window ; the function adequate to this problem is that expressing the expectation of the quadratic deviation between and : such that

However, and ; this says that : the law of succeeding an experiment with a probability , repeating times.

Therefore, we will have which implies and that

Finally, the expectation of the quadratic deviation of is written as follows: (2)The quadratic risk of integrated

For the function to be as a function of , the just mentioned, it will be convenient to integrate the function over the interval :

It seems that the speed of convergence of depends on, which leads us to reformulate the and going through the limited expansion of :

Passing through the limited development of the first Taylor degree at point, we will have

Therefore,

Proof of Equation 19. We consider that .
Moreover, based on the definition of a negligible function [7] on a point that says the following: let a function and not vanish at \{a}, is said to be negligible in the neighborhood of {a} in front of if and only if Since is negligible in front of , then and because .
Therefore, However, we have Therefore, Finally, which implies After the calculation is done, we will have Finally, we will have For its part, the variance is given by If we approach the quantity to 1, the variance can be expressed in the following form: Finally, However, the quantification of this risk function, in other words, the determination of optimal , must be carried out over the entire interval ; therefore, the integral is applied only to the nonnegligible part of The optimal corresponds to finding This leads us to look for the argument of the null derivative of. Therefore, However, we do not have the distribution of , to deduce .
To do this, it will be necessary to go through the “cross-validation” estimator. (3)The “cross-validation” formula of the estimationIt seems that, according to the formulation of IntegratedRisk, the calculation of turns out to be difficult in practice; hence, the idea is to calculate de ; this amounts to calculating the formula “cross-validation” (see Figure 1): If we estimate , we will have Finally, we have

4.2. The Kernel Method

The kernel method is applied if the continuity of the distribution function is ensured (the distribution function belongs to class functions). Let us consider in a first step the following estimate: and the are .

Let .

The strong law of large numbers indicates that almost certainly/calculation:

And finally, almost certainly.

The derivative of the function is given by

This implies

Such an estimator is known as the Rosenblatt estimator.

Generally, we put [8]

is called the kernel of the estimate; generally, this kernel can be illustrated following different paces [9]: (i)The triangular kernel: (ii)The parabolic kernel: (iii)The Gaussian kernel:(1)The quadratic risk of

The mean square deviation as mentioned in the histogram approach is defined as follows:

Definition 1. Let be an interval in and the pair , the class Hölder defined on is the set of differentiable functions which satisfy

Definition 2. Let be an integer; we would say that is a kernel function of order [10] if the functions , are integrable and which satisfy

In the following, we will focus on the kernel function of order 2.

Definition 3 (Cauchy-Schwarz inequality). If and are two integral functions on , then the absolute value of the integral of their product satisfies the following inequality: Let us calculate the upper bound of the bias [11, 12]:

Let .

Going through a Taylor expansion of order 2 of , we have with .

However, we previously assumed that the kernel function is of order 2, and therefore,

The hypothesis of implies

So

Following Cauchy-Schwartz’s theorem, we will have the following inequality:

Knowing that the calculation of the expectation is done on the interval if and vice versa.

Finally,

Likewise, the variance of is calculated as follows:

The maximum of a probability :

Put .

Finally,

Computing amounts to minimizing the upper bound of . (2)The quadratic risk of integrated

Computing from amounts to deducing it from the following integral: because (3)The “cross-validation” formula of the estimation

As mentioned previously, the cross-validation formula “” of the estimation of distributions is defined by which implies that

Let us say and show that is an unbiased estimator of .

This amounts to demonstrating that

Under the assumption that are , therefore we have

Likewise,

5. Theoretical Calculation of Strategies

Recall that the default risk , for a given year , “as discussed by Moiseev [13],” is calculated as follows:

The nonparametric estimate of yield expectations will allow the calculation of strategies for the distribution of budgets between irrigation and the construction of silos over the next _N years; these strategies are given by

The project strategy can be calculated in another way:

Strictly positive or strictly negative defaults do not have the same meaning. This leads us to calculate the budget allocation strategy using the following formula:

The problem of optimizing the min-max comes down to choosing the optimal strategy for the allocation of budgets that minimizes the risk of default; this amounts to finding the following scalar minimum:

In the case where the pollution factor is considered, the new default risk values will be obtained from the formula below:

, and are, respectively, the yield of irrigated land, the yield of nonirrigated land, and the quantities to ensile or to consume by integrating the factor of pollution in the modeling of the risk of default, then go through a vector optimization between the strategies as a function of and that calculated as a function of ; this leads us to calculate:

Then, minimize the following vector:

6. Multiobjective Pareto Optimization (Pareto Front)

A multiobjective optimization problem presents itself, in general, as follows [14, 15]:

6.1. Dominance Principle and Pareto Optimum

For a given order, relation in a space of dimension and let and let be two vectors in the decision space and , . We will say that dominates [16] , and we write , if and only if [17]

A solution of the multiobjective problem is said to be Pareto optimal [18] if there is no point of the decision space which dominates such that for each and which verifies that . All of these nondominated points in the Pareto sense constitute the Pareto front [19]. In our example, the vector of irrigation prices over the next 20 years will be the vector of decisions and the vector of objectives is the vector .

Notice that “yes” means the corresponding identity holds and “no” means the identity does not hold.

Once the strategies are simulated, we will have a data frame of two columns and of length of () (see Figure 2). Therefore and by applying the principle of dominance, the efficient strategies (Pareto Front strategies) are calculated by the algorithm of the Figure 3.

7. Empirical Part

Following a Shapiro-Wilk test [20] of the Gaussian law, the p value of the test (>5%) indicates that the yields of irrigated and nonirrigated land follow a Gaussian law (Figure 4).

Under the assumption of the normal distribution of these returns, we choose to calculate their expectations through a nonparametric estimate of the density by the kernel method by the choice of a Gaussian kernel, through the following program in Figures 5 and 6.

Since the hypothesis of the normality of yields of nonirrigated land and those of irrigated land is verified, the median and the mean must be equal, given the Gaussian law is a symmetrical law.

Through the “summary” command, we notice that the arithmetic mean is not significant for estimating the expectation because it is different from the median.

By using the kernel estimate, the average obtained corresponds perfectly to the median; this means that the estimate of the density by this method turns out to be robust for the calculation of the expected returns ( tonnes/ha for nonirrigated land and tonne/ha for irrigated land).

We assume that the pollution factor can reduce the yields of irrigated and nonirrigated land by 12% and 20%, respectively. The average of the returns by the kernel method makes it possible to deduce that tonnes/ha and tonnes/ha.

The yields and were simulated by the numerical “Box-Müller” method taking into account the intervals of the values of and according to the FAO site (from 0 to 5 tonnes/ha for nonirrigated land and more than 5 tonnes/ha for irrigated land).

The nonsimulated data series cover the period (area of irrigated land) 2001-2016 (source: FAO site); thereafter, the areas of irrigated land will be forecast over the period 2017-2036.

Total agricultural million ha (to deduct the surface areas of nonirrigated land).

The total budget to be invested will be planned based on the ARIMA time series () [21], then will be distributed between irrigation and construction of silos; the total budget to be invested will be forecast [22] based on the ARIMA time series (), then will be distributed between irrigation and construction of silos.

If a process follows an ARIMA () model, then it is represented by the following relation: such that (i) is white noise(ii) is order of differentiation of the process (iii) is the delay operator(iv)(v)

If follows an ARIMA () process, so follows an ARIMA () process.

In practice, the series of budgets was not stationary following the increased Dickey-Fuller test ( value > 5%) and therefore, the alternative hypothesis of stationarity is rejected.

After two differentiations, the series of investment budgets became stationary ( value < 5%) as shown in the code in Figure 7.

To determine the degrees and of the series, we use the partial autocorrelation function PACF and the autocorrelation function ACF, respectively (Figures 8 and 9, respectively).

From the two graphs, we can first assume that and (after these values, the PACF and AFC functions practically cancel each other out).

To ensure the choice of the appropriate model, we proceed to build the AIC matrix (Akaike Information Criterion) of the ARIMA models (, 2, ) with ranging from 1 to 3 and ranging from 1 to 3 and retain the model which corresponds to the minimum AIC index (Figure 10).

The AIC matrix confirms that the model to be retained is ARIMA (2, 2, 1).

The forecast of this series under the software over the period 2017-2036 is represented by the graph in Figure 11.

The construction of the budgets allocated to irrigation and the construction of silos over the period 2001-2016 is based, respectively, on the unit prices and ; then, their forecasts (2017-2036) will be a distribution simulation of the series of investment budgets between irrigation and the construction of silos by digital simulation methods (we go through 4 distribution simulations of the total investment budget).

The product quantity requirements for the year over the period 2017-2036 will be simulated; after the calculation of the “delta” faults under the software, we move on to calculate the () with over the period 2017-2036.

An example of calculation of and for the ; the number of simulation of () with is ( and are mentioned in Figure 1), (Figure 12).

The table of the values (tab) of the strategies simulated (in each iteration of simulation, we obtain a distribution of the total investment budget between the construction of the silos and the irrigation over 20 future years) follows Figure 2 (Figure 13).

The strategies belonging to the so-called efficient Pareto front calculated following Figure 1 are presented as follows (the strategies which do not belong to the front register a missing value NA as indicating the algorithm) (Figure 14).

After eliminating the missing values, NA was obtained by the Pareto front algorithm (Figure 15).

By going through the program encoded on R [23], we can plot the graphs the strategies obtained following Figure 1 and the Pareto front following Figure 3 (Figure 16).

8. Conclusion

Therefore, each point belonging to the Pareto front corresponds to its vector of irrigation and construction of silos amounts and it is up to the decision-maker to choose the distribution of the total investment budget between irrigation and the construction of silos, which gives a strategy belonging to the Pareto front and minimizing the agricultural risk defaults in the case of Bangladesh in future. In this article, the min-max tool was suggested to be an appropriate formula to determine strategies said “optimal” especially when we use the dimension rather than and to give an exhaustive study, other mathematics principles are invited to apply nonparametric statistics, ARIMA time series, etc.

Data Availability

All data are available on the FAO website.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

Aquastat (FAO), “Paper notes of Bangladesh,” http://www.fao.org/nr/water/aquastat/countries_regions/bgd/BGD-CP_eng.pdf.
View at: Google Scholar
“Data imported from the FAO irrigated land site,” http://www.fao.org/faostat/en/#data/RL.
View at: Google Scholar
“Unit prices for irrigation in Bangladesh,” http://www.fao.org/nr/water/aquastat/countries_regions/BGD/.
View at: Google Scholar
“Unit prices for the construction of silos,” http://www.fao.org/3/t1838e/T1838E1c.htm.
View at: Google Scholar
“The yield ranges of irrigated and non-irrigated land,” http://www.fao.org/3/Y3918E/y3918e10.htm.
View at: Google Scholar
A.-S. Dalalyan, Statistique Avancée: Méthodes Non-Paramétriques, Ecole Centrale de Paris, 2016.
M. Péchaud, “Comparaisons de fonctions,” 2008-2009, http://mickaelpechaud.free.fr/cours/comparaison.pdf.
View at: Google Scholar
A.-B. Tsybakov, Introduction to Nonparametric Estimation, Springer Science+Media Business, New York, USA, 2009.
B.-E. Hansen, Lecture Notes on Nonparametrics, Springer, University of Wisconsin, 2009.
F. Ferraty and P. Vieu, Non parametric functional data analysis: theory and practice, Springer Science+Media Business, New York, USA, 2006.
M.-P. Wand and M. C. Jones, Kernel Smoothing, Monographs on Statistics and Applied Probability 60, Springer Science+Media Business, New York, USA, 1995.
A. G. Bors and N. Nasios, “Kernel bandwidth estimation for nonparametric modeling,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 39, no. 6, pp. 1543–1555, 2009.
View at: Publisher Site | Google Scholar
N. N. Moiseev, “Mathematical Problems of System Analysis,” Tech. Rep., Nauka, Moscow, 1981.
View at: Google Scholar
B. Huang, P. Fery, L. Xue, and Y. Wang, “Seeking the Pareto front for multi-objective spatial optimization problems,” International Journal of Geographical Information Science, vol. 22, no. 5, pp. 507–526, 2008.
View at: Publisher Site | Google Scholar
T. Grabener and A. Berro, Optimisation multiobjectif discrète par propagation de contrainte, Toulouse University, Toulouse, France, 2008.
A. Chinchuluun, P. M. Pardalos, and L. Pitsoulis, “Pareto Optimality, Game Theory and Equilibria,” in Springer Optimization and Its Applications, New York, USA, Springer, 2008.
View at: Publisher Site | Google Scholar
A. Benki, Méthodes efficaces de capture de front de Pareto en conception mécanique multicritère: applications industrielles, Université Nice Sophia Antipolis, 2014, HAL archives-ouvertes.fr.
J. Dipama, Optimisation Multi-Objectif Des Systèmes Énergétiques, de Montréal University, Monteral, Canada, 2010.
G. Guillopé, Optimisation sous contrainte, Laboratoire de mathématiques Jean Leray, Département de mathématiques, Nantes University, Nantes, France, 2015-2016.
R. Rakotomalala, Tests de normalité Techniques empiriques et tests statistiques, Version 2,0, Lumière Lyon 2 University, Lyon, France, 2011.
A. Charpentier, Cours de Séries Temporelles Théorie et Applications (volume 2), Paris Dauphine University, DESS Actuariat & DESS Mathématiques de la Décision, 2018.
B. El Griny, G. Kouaiba, M. Imegri, Y. El Qalli, and D. Mentagui, “Les Modèles ARCH et GARCH: Application au CAC40,” Iejmae, vol. 8, no. 3, pp. 36–51, 2017.
View at: Google Scholar
“R project website,” http://www.r-project.org.
View at: Google Scholar

Copyright

Copyright © 2021 Ghizlane Kouaiba and Driss Mentagui. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1057

Downloads

1035

Citations