Abstract

Nitrogen rate trials are often performed to determine the economically optimum N application rate. For this purpose, the yield is modeled as a function of the N application. The regression analysis is commonly used to estimate the modeled functions and the economic optimum rate, . However, computer programs do not calculate confidence intervals for derived from quadratic yield response model or other commonly used models. The objective was to develop a method for computing and interpreting confidence intervals for . These confidence intervals can be estimated using an online program VINO.EXE. For the N rate trials on the experimental field Sieblerfeld (Bavaria), confidence intervals were computed for a range of wheat and N fertilizer prices and for selected N rates for quadratic and linear plateau models. The latter concerns the comparison with confidence intervals based upon the linear-plus-plateau yield regression model. All intervals were found to be unexpectedly wide and their ranges were affected by N rates used in the calculations and by the choice of yield response model.

1. Introduction

The effect of fertilizer on the yield of agricultural crops can be studied using N response functions. Such functions are usually fitted to the data from N rate trials by regression. The available function types used for modeling purposes in the course of this discussion are, for example, quadratic (e.g., [1, 2]) or the linear-plus-plateau functions (e.g., [3]) and the Mitscherlich, which is a kind of exponential model [4, 5]. Other researchers had additionally investigated the quadratic-plus-plateau model [6, 7], the square-root model [6, 8], and even more complicated models [9]. On the basis of the assessed production functions, the economically optimum application rate was estimated.

The economic optimum is reached when the marginal cost of the fertilization corresponds to the marginal revenue, that is, when the returns above the fertilizer cost (RANC) are maximized. For a given product price (in ) and a fertilizer price (in ), these returns are computed as where is the measured yield in and is the rate in kg . The rate, where these returns above the fertilizer cost are maximized, is the economically optimal rate, .

The evaluation of any rate trial is usually followed by the analysis of the residuals and the determination coefficient , to justify the choice of the model applied. However, as discovered by Cerrato and Blackmer [6], is not a suitable measure, as it barely depends on the chosen model. The point estimate for , which is derived from the fitted model does not, however, provide any information on the accuracy or reliability. Therefore, our objective is to compute and discuss confidence intervals for , which will be based on quadratic response functions, with the consideration of extensive rate trials as an example. The results can be used for optimizing decision making in nitrogen management. Furthermore, from an ecological point of view, the knowledge of these confidence intervals for is extremely important for identifying the corresponding rate recommendations from the lower limit of these ranges, so that economically unnecessary balance surpluses are avoided.

2. The Method for Computing the Confidence Interval

2.1. A Quadratic Model for the Yield

It was assumed that the expected yields, E(), can be described by a quadratic function of the total N application rates, . Therefore, the yields, , , were modeled as random variables that depended on in the following way: denotes fixed levels of application rates, , the fixed unknown coefficients of regression, and the error variables, which are assumed to be independent and normally distributed with an expected value 0 and a common unknown variance . The unknown coefficients , , were estimated using the least-square estimates . This requires at least three different -levels. Otherwise, the estimates would not be unique.

2.2. The Optimum Application Rate and Its Point Estimate

The economically optimal application rate, , is the rate where the expected returns above fertilizer cost, E(RANC) , are maximized. This applies to the following optimum rate: results from the where the first derivative of the model parabola, E(Y) , equals , which is the ratio between fertilizer and the product price.

By using the least-square estimates for estimating the coefficients , the point estimate for was immediately reached: Note that this estimator is biased because it is not a linear combination of the unbiased least-squares estimators and . Fundamentally, it is a ratio. Each estimator in this linear model is unbiased and thus also the estimated yield, as it is a linear combination of these coefficient estimators. The estimated N optimum in (4), however, is not a linear combination of these , but a ratio in essence, so that unbiasedness no longer holds. To see this, assume for simplicity that and that E() = = 1 and E() = = βˆ’0.5, so that the ratio of the expected values in (4) is equal to 1. Assume further that the variance of is negligible, so that , and consider the following two -values: and , which are symmetric when . In the former case, the ratio in (4) is around 5, in the latter case around 0.56. The mean of both values is 2.78. If is additionally assumed to be always negative, it is felt that the expected ratio in (4) is greater than , which results, according to (3), in the value 1.

2.3. Deriving an Exact Confidence Interval for

The uncertainty of a point estimate, can be seen from a confidence interval for . According to Lehmann’s [10] duality with confidence intervals and tests, a confidence interval or, more generally, a confidence set for consists of all hypothetical values whose simple null hypothesis, cannot be rejected. By using (3), the null hypothesis in (5) can be reshaped to This is a linear hypothesis, so it can be tested in the framework of general linear models using the usual -test [11], which is a likelihood ratio test. Therefore, the corresponding confidence set is called a likelihood interval, provided that it is an interval. This usually applies but it does not need to. As mentioned before, it is formed by the set of all for which in (5) or (6) cannot be rejected. For those interested in the area of statistical inference, this test and the corresponding derivation of the confidence set, which can be produced by explicit mathematical formulae, is described fully in Bachmaier [12]. The confidence set is analogous to that of Fieller [13], who calculated it for the ratio of parameters, and to that of Koziol and Zielinski [14], who computed it for the maximum of a quadratic regression.

Under the assumptions made (quadratic model, independent and normally distributed homoscedastic errors), tests of linear hypotheses, such as in (6), are considered exact, so exact confidence intervals are obtained, whereas confidence intervals derived according to Weisberg (2005, sec. 6.1.2) [11], Mittelhammer et al. (2000, sec. 8.8 P. 183–185) [15] and Casella and Berger [16], which are also called Wald intervals, would only provide approximate confidence intervals. They are symmetric around the point estimate, which is not even unbiased. Therefore, they β€œmay not accurately reflect the actual, often asymmetric, uncertainty in an estimate” [17]. Such an asymmetric situation is given here as seen with rapidly increasing fertilizer application rates. The true yield function decreases more slowly, as indicated by the quadratic model, which overestimates the yield loss due to lodging [18]. Therefore, the Delta method will not be pursued but the likelihood intervals will be used to proceed exactly as mentioned above. These intervals are not symmetric around the point estimate, so neither limit depends on the other and they can adapt better to the data around either limit. The lower limit of the confidence interval is in the focus of ecological interests, as it gives the minimum fertilization that cannot be rejected as being optimal.

2.4. Program Implementation

The computation of the likelihood-type confidence set for was implemented in the Fortran program VINO.EXE. The program CIGIGRAD.EXE could also be used. It does not specialize in optimizing applications but it is able to compute a confidence set for the x-coordinate where the modeled regression parabola has a target gradient, which must be entered by the user. Note that, if one wants to compute a confidence set for using CIGIGRAD.EXE with a given fertilizer price of and a crop price of , one must not input the gradient that results when respecting the units. Instead, only the number , which stands for the gradient writing should be inserted, provided the application rates are given in and the yields are given in . A special case of the CIGIGRAD.EXE is the CIVERTEX.EXE, which calculates the confidence set for the -coordinate of the parabola vertex. This -coordinate corresponds to the if the -values are the -levels and the -values are the returns above fertilizer cost, RANCi in (1), which are to be maximized. Thus, CIVERTEX.EXE can also be used for computing a confidence set for . All programs mentioned can be downloaded from the internet [19].

3. The Field Site

The test field, Sieblerfeld (5 ha), is in the Tertiary hills of Upper Bavaria, Germany, and has two very different yield zones. The soil texture in the high-yield zone is a sandy loam with an available field capacity of the rooted soil horizons of 160 mm. In the low-yield zone, the soil texture is a loamy sand with an available field capacity of the rooted soil horizons of 100 mm [20]. Within the two distinct yield zones, -rate test areas were designated to derive site-specific -response functions. For this, -rate trials were carried out in the season 2001/02 on winter wheat (Triticum aestivum L.).

The trial design was a randomized complete block design with four blocks in each yield zone. To investigate the dependence between yield and fertilizer rate, 11 plots that were given with different rates of N (0, 80, 100, 120, 140, 160, 180, 200, 220, 240, and 260 ) were selected randomly from each block in both zones, so that there were yield measurements in each yield zone. The yields were measured with a plot combine. The plots were 12.5 m2 with a length of 10 m and a width of 1.25 m. Figure 1, which has been adopted from Figure 1 in Bachmaier and Gandorfer [21], illustrates the randomization results at the midpoints of the plots; these were recorded with a precision of 1 m with GPS technology.

Randomized complete block designs are more beneficial compared to complete block strip trials because only the former ensures independence of the variable in question. Designs of the latter kind are widely used [22–24], but their lack of randomization within strips can result in the type of strip heteroscedasticity and correlation identified in Hurley et al. [22]. In particular, correlations within strips occur in monitored yield data [1, 24, 25] because of the threshing system’s mass flow and the yield monitor’s datapreprocessor.

On the other hand, complete blocking and randomization do not usually ensure complete independence, because neighboring plants at the low rate usually rob from the plants from the high rate, creating the so-called border effect. Since these treatments are not much separated in space (Figure 1), the yield measured for both treatments is affected, and randomization cannot solve this problem.

4. Results

4.1. Assumption Check

In Bachmaier and Gandorfer [21], where the same trial was evaluated to test the hypothesis that there is no difference between the optimal -rate in high- and low-yield zones (precision agriculture hypothesis) and to compute confidence intervals for the difference of the optima the usual assumptions have been checked. The result was that the data can be assumed to be approximately normally distributed and homoscedastic, and further, a block effect in the randomized complete block design could not be detected. Therefore, the ordinary regression analysis and the mentioned methods to compute confidence sets for the optima, can be applied, but the lack of complete independency on account of the border effect mentioned above will distort this analysis slightly.

4.2. The Data and the Quadratic Regression Function

Figure 2, which is also already depicted in Bachmaier and Gandorfer [21], shows the trial results and the quadratic regression curves as estimations of the yield functions for high- and low-yield zones when all levels are considered. The adjusted coefficient of determination in high-yield sites is ; the low-yield sites have an of 0.726.

The narrow boxes in Figure 2 indicate the confidence intervals for in both yield zones, which overlap. This, however, does not mean that a statistical proof for different economic optima in both zones cannot be provided. This proof could be reached if the confidence interval for the difference of the optima in both zones did not contain the value zero [21]. In the following subsections, only separate confidence intervals for in both yield zones are computed and analyzed.

4.3. The Asymmetry of the Confidence Intervals

Figure 2 also shows the confidence intervals for when the ratio between fertilizer price and crop price is . This applies, for example, to and . This ratio corresponds to the quadratic model’s target slope that is to be reached by the optimum fertilization. It is indicated by dotted lines in Figure 2.

A likelihood ratio confidence set needs not to be symmetric around the point estimate. The confidence interval [204 kg , 328 kg ] for the high-yield zone is only 36 kg  long to the left of the point estimation, whereas to the right it is 98 kg , which is nearly three times as long. This is, above all, due to the fact that yields from very high rates do not seem to sink. For the low-yield zone, a point estimation of 199 kg  and a confidence interval of [173 kg , 260 kg ] present a similar situation with regards to the asymmetry. It seems that a concave parabola with a vertex at the far right of the point estimation is easily compatible with the measured data, whereas a concave parabola with a vertex at the far left of the point estimation is not suited for fitting the yields, since those to the right of such a vertex do not sink. Therefore, the proposed method for determining exact confidence intervals fits well to the measured data.

4.4. The Length of the Confidence Intervals

As can be seen in Figure 2, the 95% confidence interval for in the low-yielding zone [173 kg , 260 kg ] and is somewhat shorter than the 95% confidence interval for in the high-yield zone [204 kg , 328 kg ]. This can be explained as follows. In contrast to the high-yield zone, the higher doses in the low-yield zone did not result in further increases in yield while a small yield depression could even be observed at 260 kg that forced the regression function to go down. Consequently, the parabola maximum can be more clearly determined so that a long confidence interval could also be avoided. Higher rates in the higher-yielding zone have a shortening effect on the confidence interval for . This will also be demonstrated in Table 2, where confidence sets are compared with regards to the different designs of the levels. It was observed that the omission of both the maximum application rate and zero application rate enlarged the confidence set enormously.

When considering the enormous length of both the 95% confidence intervals, it becomes clear that even in such extensive rate trials, the ex post estimated optimum rates can only be roughly estimated. It is not possible, in retrospect, to limit the optimal nitrogen quantity to a level of less than 87 kg  length in the low-yield zone or 124 kg  in the high-yield zone. The length of the confidence intervals results from the fact that fertilization has a very wide marginal profit area. From the economic point of view, however, the additional quantity used eradicates the economic advantage of the increase in yield. This leads to a very flat function around the optimum for returns above fertilizer cost, making it compatible with many model parabolas with widely ranging vertices. The set of these vertices is the confidence interval, which is, therefore, very long.

4.5. Effects of Fertilizer Price Increase on Confidence Intervals

Table 1 shows the dependence of point estimation and confidence intervals for on the proportion between fertilizer and crop price, . The table also makes reference to the case of , where the fertilizer is gratis and the corresponding point estimates and confidence intervals for are point estimates and confidence intervals for the yield maximum, as it coincides with the economic optimum. Table 1 clearly shows that an increase in fertilizer prices or, more generally, an increase in the proportion between fertilizer and crop price, , leads to a lower volume of use. then would move further to the left (see Table 1). In the high-yield zone, for example, a proportion price doubling from to , which may occur when the price doubles against an unchanged winter wheat price, reduces the point estimation from 236 kg  to 190 kg , thus shifting the 95% confidence interval from [201 kg , 321 kg ] to [167 kg , 240 kg ]. The confidence interval length is reduced from 120 kg  to 74 kg —because it is offset from the area where the true yield function seems to be relatively flat. The modeled parabolic curve for the yield at this is no longer close to the horizontal but has instead a greater positive gradient. In this region, the model parabola appears to come much closer to the real situation. However, the area where the true unknown yield function is relatively flat seems to be much wider than indicated by the fitted parabola. This was also criticized by Boyd et al. [18], who pointed out that quadratic yield functions do overestimate yield loss due to lodging, that is, true production function does not sink as quickly as a modeled parabola.

4.6. Regression Analysis When Omitting the Zero Application Rate

It is justifiable to omit levels below 80 kg  in order to determine the economic optimum, as it is not reached at low rates. Because then the area to be fitted around the unknown true optimum becomes smaller, and a smaller area can be better approximated by simple functions such as quadratic ones that never model the response precisely over a wide area.

Figure 3 shows the trial results and the quadratic regression curves for both yield zones when the results of the zero rate are not respected. The trial results of all other ten levels 80, 100, 120, , 260 kg  were used for the analysis. The adjusted coefficient of determination in high-yield sites is ; the low-yield sites give only a low of 0.320.

4.7. The Influence of the Rate Trial Design on the Confidence Sets

In order to analyze the influence of the rate trial design on the point estimate and confidence set for the economically optimum rate, point estimates and confidence interval for with differing numbers of levels were calculated. The results are shown in Table 2. Data from the low- and high-yield sites were processed separately, independent of each other.

4.7.1. Omitting the Zero Rate

If zero fertilization were not taken into account in the estimate, it would lead to a higher estimation of the economically optimum rate, , and to the lengthening of the confidence interval. The 95% and 99% confidence sets shown represent confidence sets that are no longer intervals. They are made up of all real numbers with or even without the exception of an interval. The latter applies to the 99% confidence set in the low-yielding zone. The reason for such degenerated confidence sets is that the shape of the parabola, concave or convex, around which the data disperse, cannot be clearly recognized when the zero rate is omitted. The data could be fitted passably by a line with a positive gradient and thus, almost equally, by very wide parabolae whose vertices lie to the far left if they are convex and to the far right if they are concave. This way, a gap-type confidence set arises, which contains -coordinates for profit maxima (on the right of the gap) as well as profit minima (on the left of the gap).

The gap can disappear if, in addition, the error probability is so β€œslight” that every vertex can be considered as being β€œslightly” compatible with the data. In practice, all these confidence sets are, of course, worthless. Based on the research findings, feasible fertilization rules cannot be given at a high level of confidence. The gap type or the extreme length of these confidence sets, without zero fertilization, is an indication of the fact that the function for the returns above fertilizer cost, which is to be maximized, must be relatively flat in the area of its maximum, making it very difficult to locate.

4.7.2. Considering All Levels

If, however, the zero rate is taken into account, the parabola is forced to sink as a result of the low yield under no fertilization, but not just to the left of the maximum. The symmetry of the parabola means that it would sink to the right as well. In this way, the region around the maximum is less flatly modeled as it appears to be in reality. The shorter length of the confidence interval that results from this may also possibly be attributed to a weakness in the quadratic model in the area of optimum. However, by excluding zero fertilization, the model is limited to a really small area of interest and the weakness is then of little importance. The smaller the region, the better it can be modeled by the parabola. On the one hand, the confidence interval is extended when omitting the zero fertilization, but on the other hand, it is also made more trustworthy since it is less influenced by the model assumptions.

4.7.3. Omitting the Highest Tested Rate

If the highest tested level is excluded, a minor reduction in the point estimate of the optimum rate can be seen in comparison to when all levels are included (Table 2). In particular, the confidence interval also extended upwards in spite of the decrease of the lower boundary and the point estimate. The highest tested level is still in the vicinity of the optimum. If test values are missing in this area, there would be absolute uncertainty with regard to the possible maximum on the far right. The modeling of this vertex, of a downward facing parabola to the far right, is possible if the parabola is not forced to adapt to the data in this region. As a result, the confidence interval extends considerably to the right when the highest tested level is excluded. When it comes to ensuring a sensible calculation of the confidence interval, it is essential to take high levels into account.

4.8. Comparison of Confidence Sets to the Linear-Plus-Plateau Model

In the introduction, five function types used for modeling the yield response are mentioned (quadratic, linear-plus-plateau, Mitscherlich, quadratic-plus-plateau, square-root). The evaluation of rate trials based on these models is related to the point estimates for . Usually, confidence intervals are only computed in the linear-plus-plateau model, where is one of its parameters. The confidence intervals of Hernandez and Mulla [2], however, are also based on the quadratic model, but they were not compared with those based on the linear-plus-plateau model. Yet we will see that such a comparison, which will be made hereafter, reveals large differences.

Figures 4 and 5 show the data of both yield zones as fitted by the linear-plus-plateau functions. Figure 4 takes all 11 levels into account, whereas Figure 5 excludes the zero rate. Both figures also show the corresponding 95% confidence intervals, which were computed by the program PRISM 4.0 [26].

As far as the goodness of fit is concerned, it can be stated that both models, the quadratic and the linear-plus-plateau model, fitted equally good. Cerrato and Blackmer [6] also made the same conclusion, but this does not validate the finding completely because, despite this similarity, Cerrato and Blackmer [6] and many other authors found marked discrepancies with respect to the [27–30]. Such discrepancies also apply to the confidence intervals for and their length. The linear-plus-plateau model provides smaller confidence sets, as can be seen in Table 3. These confidence sets do not, contrary to the parabola-based confidence sets, depend on prices unless the price ratio is greater than the gradient of the increasing straight line [31]; but this is not the case. When considering all levels, as in Figure 4, the increasing straight line’s slope is for the high-yield zone and 0.033 = 33 for the low-yield zone. Figure 5 depicts the design without the zero rate, where the slope of the increasing straight line of the linear-plus-plateau model is 19 for the high- and 15 for the low-yield zone. All these slopes are greater than the realistic price ratios , which at present do not exceed the value 12, as seen in Table 1. Thus, the economic optimum, , equals the transition of the increasing straight line to the horizontal, which is a parameter of the linear-plus-plateau model. Since PRISM calculates asymptotic confidence intervals for all model parameters [26] (2003, chap. E), they were automatically obtained for . They are displayed in Table 3.

The shorter length of the confidence intervals can be attributed to the fact that the transition from the increasing straight line to the horizontal is clearly easier to identify than the position where a parabola has a rather flat positive gradient. It is not surprising that the inferences from the two models, the linear-plus-plateau and the quadratic one, which are based on such very different biological assumptions, are incompatible. Nevertheless, it may be surprising that the confidence intervals for differed so much in both models. When considering only the design with all levels, these confidence intervals did not even overlap. In the high-yield area, the linear-plus-plateau model resulted in a 95% confidence interval of [158 , 200 ], whereas the quadratic model yielded a confidence interval of [204 , 328 ]. The same pattern was seen in the low-yield area. If the linear-plus-plateau Model is assumed to be true, then the estimate lies between 106 and 148 , whereas the quadratic model considers fertilization between 173 and 260  as optimal.

5. Discussion

In spite of the symmetrical model, the proposed confidence interval for is asymmetric around the point estimate. This has the advantage of better adaptation to the real situation, but it can also deviate from the interval form, so that it contains profit maxima as well as profit minima. The latter is the exact opposite of what we want. These types of degenerated confidence intervals only occur when extreme rates, such as zero or very high rates are missing. When only rates that lie in the area of the possible optimum are tested, the yield data do not clearly indicate that the model parabola is concave. Confidence sets are only useful if they are small, but short confidence intervals can only be obtained if the empirical data clearly indicate a concave model and thus a small area where the true profit maximum could be. Then gap-type confidence sets, which also contain profit minima, would no longer occur, and the disadvantage that the proposed kind of confidence set does not differ between profit maxima and minima will not be an issue.

Therefore, it seems urgently necessary to calculate confidence intervals based only on designs that include the zero rate and very high , so that a concave shape of the - scatterplot is clearly recognizable. By using such datasets, these extreme levels will have, due to their leverage effect, a strong influence on the regression parabola and thus also the point estimation and the confidence interval for . The point estimation and confidence interval are then mainly determined by the choice of the modeling of the yield function between the extreme levels, which lie far away from the economic optimum. The estimation of now only seems to be a β€œby-product” of the fitting of an unknown yield function to a very wide range through a simple model. As seen in Table 3, the confidence sets based on different models are so different that they do not even overlap when all levels are considered. The true model, if it exists, would lead to a good estimate of the by-product, , but none of the known simple models could describe the reality accurately over a wide area.

The quadratic model is characterized by the extreme symmetry around the maximum yield, although there is nothing symmetrical in the nature, especially in the field conditions and in many ecological systems. That is why many studies in the past have shown that the quadratic model greatly overestimated , and also the present study shows that the quadratic model leads to much higher estimates than the linear-plus-plateau model. The constant slope or gradient of the latter is also not natural, but the quadratic curve is especially troublesome because rates in the deficient range impact yield predicted at the above optimal range (by forcing yields to artificially decrease) and vice versa, even if the yield decrease from excessive rates is not evident. If wheat lodging is observed from excessive rates, then the quadratic model might be appropriate.

Marked discrepancies with respect to the underlying model had already been pointed out by many authors [6, 27–29]. These discrepancies equally affect the corresponding confidence intervals. Computing and comparing them with regard to different models is, therefore, an important field of research. The resulting discrepancies and the enormous length of the confidence intervals heighten further the awareness of how inaccurately can be estimated.

6. Conclusions

The usual modeling and estimation of yield functions and their optima inspire the impression of scientific exactness. The derived point estimate of together with a not all too low are misleading in creating overconfidence in the point estimate of . In this article, however, confidence intervals were derived to show that this confidence in the estimation and , at least for these experimental data, is not advisable. The confidence intervals were extremely long and strongly dependent on level design and modeling. Thus, the economic optimum can only be very inaccurately estimated from yield regression models based on rate trials. This is also attributed to the fact that the area of fertilizer applications that are very close to the economic optimum is very wide. The economic profit would not reveal much of a difference in an application range of 150 kg to 250 kg . Each of these values lies in approximation to the optimum. There is a very wide range in which the exact optimum could be found. Despite the fact that a particularly comprehensive rate trial was performed, something which can only rarely be implemented in practice, such a search by means of yield data is still very difficult due to local heterogeneity and the weakness of all models.

By means of the presented methods and further rate trials, further studies should be done to see if the findings here do recur under other site conditions and experimental years. Should this be the case, then future approximations of the economic optima should be given in addition to their exact point estimations. Because these ranges are very wide, the challenge is, from an ecological point of view, to identify the corresponding rate recommendations from the lower limit of these ranges, so that economically unnecessary balance surpluses are avoided.