Research Article | Open Access
Examination of the Effects of Curvature in Geometrical Space on Accuracy of Scaling Derived Projections of Plant Biomass Units: Applications to the Assessment of Average Leaf Biomass in Eelgrass Shoots
Conservation of eelgrass relies on transplants and evaluation of success depends on nondestructive measurements of average leaf biomass in shoots among other variables. Allometric proxies offer a convenient way to assessments. Identifying surrogates via log transformation and linear regression can set biased results. Views conceive this approach to be meaningful, asserting that curvature in geometrical space explains bias. Inappropriateness of correction factor of retransformation bias could also explain inconsistencies. Accounting for nonlinearity of the log transformed response relied on a generalized allometric model. Scaling parameters depend continuously on the descriptor. Joining correction factor is conceived as the partial sum of series expansion of mean retransformed residuals leading to highest reproducibility strength. Fits of particular characterizations of the generalized curvature model conveyed outstanding reproducibility of average eelgrass leaf biomass in shoots. Although nonlinear heteroscedastic regression resulted also to be suitable, only log transformation approaches can unmask a size related differentiation in growth form of the leaf. Generally, whenever structure of regression error is undetermined, choosing a suitable form of retransformation correction factor becomes elusive. Compared to customary nonparametric characterizations of this correction factor, present form proved more efficient. We expect that offered generalized allometric model along with proposed correction factor form provides a suitable analytical arrangement for the general settings of allometric examination.
The model of relative growth of Huxley  is formally stated by means of a scaling relationship of the formwhere and are measurable traits and the parameter is designated as the allometric exponent, while is identified as the normalization constant. This model, also termed equation of simple allometry, has been extensively used in research problems in biology [1–5], physics , economics , earth sciences , resource management, and conservation [9, 10], among other fields.
Eelgrass provides nursery for waterfowl and fish species. By trapping sediment and stumping wave energy, this seagrass promotes shoreline stabilization. Eelgrass services also include nutrient recycling, water filtration, and carbon dioxide removal. Current anthropogenic influences threaten eelgrass permanence. Conservation efforts rely on plot transplanting in a fundamental way. Monitoring effectiveness depends on measurements of standing stock and productivity through time. This makes the assessment of average leaf biomass in shoots a necessary input. But traditional estimation of eelgrass leaf biomass units relies on destructive methods. This could alter shoot density in a developing transplant. Thus evaluation renders indirect assessment methods necessary [10, 11]. Results show that an allometric scaling of the form of (1) for eelgrass leaf biomass and associated area is consistent . Derived projections of individual leaf biomass convey useful surrogates for mean leaf biomass in shoots. Moreover, estimates of the parameters and are invariant within a given geographical region [10–12]. Hence, estimates fitted at site can endow suitable projections of leaf biomass values currently observed in other places of the region. This bears the referred allometric projections of a convenient nondestructive feature ([11, 13]).
Simplicity of (1) makes allometric projection of eelgrass leaf biomass units convenient. But there are caveats on dependability. For instance, even though and are invariant, environmental influences can induce a relative extent of variability on local estimates ([12, 14]). Besides, the response in the power function-like scaling of (1) is very sensitive to variation of parameter estimates. Then, accuracy of derived proxies is subject of error propagation of estimates. In addition, there are factors of biological scaling which can influence precision of estimates (e.g., [10, 11, 13, 15]). Packard  questioned results of Mascaro et al.  on allometric examination, and Mascaro et al.  responded to criticism. Going over this exchange highlights the relevance of procedural factors in determining precision of parameter estimates of allometric scaling. It also offers a convenient framework for the aims of the present research.
An important factor influencing precision of estimates of allometric parameters is analysis method. A widespread approach is the traditional analysis method of allometry (TAMA hereafter). It relies on a log transformation of data in arithmetical scale in order to contemplate a linear regression model in geometrical scale. Then, the fitted line is back-transformed to yield a two-parameter power function in the original scale. But embracing this procedure fuels a vivid unresolved debate. Views assert that this protocol can lead to biased results (e.g., [16, 19–29]). And other practitioners consistently wed to the idea that logarithmic transformations are necessary (e.g., [18, 30–39]). An alternative to the TAMA approach is using nonlinear regression methods in the direct scale of the data . Echavarria-Heras et al.  concluded that producing allometric projections of average leaf biomass in eelgrass shoots must rely on this protocol. Yet a direct nonlinear regression approach in allometry is also not unfailing. For instance, inadequate identification of the inherent error structure can lead to significant bias . Besides, Lai et al.  found that estimates of allometric parameters fitted by nonlinear regression can exhibit a high sensitivity to the largest values of the covariate. Therefore, evaluation of analysis method suitability in acquiring consistent eelgrass leaf biomass proxies needs revision.
The adoption of methods of curvature in geometrical space could offer a way to overcome inadequacy of the TAMA procedure ([27, 40–42]). In particular, it is pertinent to examine if taking curvature into account leads to improved accuracy of eelgrass leaf biomass proxies. But, according to Mascaro et al , curvature could manifest because of methodological factors of data gathering. Thus, an examination of the effect of curvature in eelgrass leaf biomass allometry must also take into account a possible participation of data quality effects. Mascaro et al.  reminds on three ways of handling curvilinearity in geometrical space. One is by separating data to contemplate different local linear models to account for heterogeneity of effects of the covariate [43–45]. A second one is by fitting a polynomial model [46–49]. A third approach endorses direct nonlinear regression assuming a heteroscedastic error structure as contemplated by Mascaro et al. . Either approach above bears complexity beyond the linear model in geometrical space that associates with the customary bivariate power function of allometry. This suggested putting forward a generalized allometric model intended to deal with curvature in geometrical space. This paradigm incorporates parameters that change as continuous functions of the log transformed covariate ([27, 39, 46]). As we explain further on, the curvature arrangements recommended by Mascaro et al.  can be all derived from the offered formalization. Moreover, a nonzero intercept power function that Packard  recommends to handle curvilinearity in geometrical space also derives from the presented generalized scaling model.
But any scheme addressing curvature in geometrical space depends on a factor for correction of bias of retransformation of the regression error. In the general settings, if stands for the regression error, then the said correction factor, through denoted using the symbol , is given by the mean of the exponentiated error random variable; that is, [51–53]. Furthermore, the TAMA approach relies on the essential assumption that is additive, normally distributed, and homoscedastic . When this happens, takes its lognormal-mean form [51–53]. But if fails to be normally distributed, there are two possibilities. If the distribution of is known, we could derive a closed form for . In turn, if the error distribution is not identified a priori, a widespread approach is taking in the nonparametric form given by the smearing estimator of bias of Duan . Still, there are provisions on this. A smearing estimate form can fail to compensate the downward retransformation bias of logged data ([53, 55, 56]). Thus, in a circumstance where is unspecified, characterizing seems elusive. Here, we put forward an arrangement for aimed to get around this circumstance. Zeng and Tang  proposed a nonparametric alternate to the smearing form. It matches the first three terms’ partial sum of a power series expression of , assuming Form suggested here corresponds to a generalization of this construct. It does not abide the restriction and matches an terms partial sum approximation of the exponential series representation of . The partial sum that maximizes reproducibility strength of retransformed mean response sets criterion to choose .
Present results show that a consideration of curvature in geometrical space, as well as a suitable characterization of the correction factor of retransformation bias, offers consistent allometric proxies of observed mean leaf biomass in eelgrass shoots. Hence, contrary to views asserting direct nonlinear regression as mandatory in allometric examination, our findings validate a parallel reliability of log transformation based methods. This is well in line with claims of Mascaro et al.  and many others about blaming the use of logarithms of incongruent results in allometric analysis. Moreover, keeping the analysis in geometrical space unraveled heterogeneity in the inherent leaf biomass scaling pattern. This could not be achieved by clinging to direct nonlinear regression in arithmetic space as the only valid approach of allometric examination. Offered analytical arrangement is expected to be applicable in the general settings of allometry.
2. Materials and Methods
For the aims of the present research, we relied on an extensive eelgrass data set collected in San Quintin Bay, a coastal lagoon on the Pacific side of the Baja California Peninsula, México (30°30’ N – 116°00′W), and through a 13 months’ long sampling period covering a whole-year cycle. Data composes measurements of length (mm), width (mm), and dry weight (g) of a total of 10412 individual eelgrass leaves taken from 20 randomly thrown 400cm2 quadrats every monthly visit to the site. A sampling visit will be further referred to as “sampling time” in the text. The length times width proxy  provided estimations of leaf area (mm2). In order to test for methodological influences of data gathering, we processed raw data set according to a mean plus or minus two standard deviations outlier’s removal procedure . Appendix A presents results of an exploratory analysis of data.
As above specified symbols, and stand for the biomass of an individual eelgrass leaf and its respective area one to one. Echavarría-Heras et al.  assert that these variables can be related through the bivariate allometric model of (1). One procedure to acquire estimates for the parameters and is fitting directly in arithmetical scale a nonlinear homoscedastic regression model. Besides, we can use a TAMA approach, that is, fitting the linear regression modelwhere , , and is a random error term assumed to be normally distributed with zero mean and variance , that is,
We conceive curvature in geometrical space as a circumstance where fitting results of regression model of (2) are inconsistent. Dealing with this situation amounts for considering complexity beyond incorporated by (1). One possible approach to address curvature is assuming that scaling parameters and in (2) depend continuously on the covariate ([27, 39, 46]). This is consistent with the generalized allometric model,with and intended to be continuous and differentiable functions defined on and with being positive. Certainly a log transformation , of (3) establishes the regression modelwherewhere , a residual error term, is conceived as a random variable that in the general settings is -distributed with mean and variance set by a function of the covariate , that is,
Setting and with and constants reduces (4) to the regression model of (2). In Appendix B, we explain that (3) accommodates all curvature paradigms suggested by Mascaro et al. . These include a biphasic and a polynomial model in geometrical space, as well as the nonlinear heteroscedastic model referred to by Mascaro et al.  in direct arithmetical space. Moreover, as shown in Appendix B, the three-parameter power function chosen as an alternate standard for curvature  also derives from (3).
2.2.1. Biphasic Model in Geometrical Space
In order to characterize the model of (3) in a biphasic mode, we let and , such thatincluding parameters and and the function given byfor . is a Heaviside function , evaluated at and correspondingly , with being a point separating growth phases and . Then, denoting by means of the resulting form of from (4), we get the biphasic regression modelwhere is a random error term as defined in (4) andwithwhere and for parameters are to be estimated from data.
2.2.2. Polynomial Model in Geometrical Space
Similarly, assume that and , with and for are coefficients; one can acquire a polynomial representation for the generalized mean response function in geometrical space This way the polynomial form of regression (4) becomeswhere is a random error term as defined in (4), withand , for parameters.
2.2.3. Nonlinear Heteroscedastic Model in Arithmetical Space
As we explain ahead, direct algebraic manipulation of (3) leads to the consideration of the nonlinear heteroscedastic regression model addressed by Mascaro et al. ; namely,with , , and being parameters and being a zero mean normally distributed error term with covariate dependent variance , that is, .
A nonlinear homoscedastic form derives from (14) by setting ; that is,And, again is an additive error term assumed to be normally distributed with zero mean and variance , that is,
Appendix A deals with exploratory analysis of data Appendix B presents notation convention and also explains how all addressed paradigms derive from the generalized model of (3). Appendix C explains the addressed forms of correction factor for bias of retransformation of the regression error. Fitting results of the geometrical space based models appear in Appendix D. Those corresponding to the nonlinear heteroscedastic and homoscedastic models pertain to Appendix E. Agreement between observed and projected values is commonly evaluated by analyzing values of Lin’s Concordance Correlation Coefficient (CCC) . This correspondence index is commonly denoted by means of the symbol . Agreement will be defined as poor whenever , moderate for , good for , or excellent for . Besides CCC values, we assessed reproducibility by comparing goodness-of-fit statistics, such as the coefficient of determination (CD), standard error of estimate (SEE), mean prediction error (MPE), total relative error (TRE), average systematic error (ASE), and mean percent standard error (MPSE) ([63–65]). For statistical tasks, we relied on the R package release 3.5.
Exploratory analysis in Appendix A identifies maximum, minimum, and sample mean values for observed leaf area values and associated dry weights . We also explain distribution of variables in terms of quantiles of probability 0.1, 0.25, 0.50, 0.75, and 0.90, for both crude and processed data. Statistical exploration extends to log transformed values of these variables. We present Q-Q plots (quantile-quantile) for comparison of distribution patterns, as well as boxplots for the 13 months’ long sampling scheme for both crude and processed data. We can learn that, from month 2 to month 6, a reduction in the values of weight and area occurred; this perhaps is explained by an increase in temperature during the period. We can be also aware that a similar variation pattern over time is shown in both raw and processed data sets.
3.1. Fitting Results of Geometrical Space Models
In order to validate curvature in geometrical space, we compared the linear model derived from (1) as well as biphasic and polynomial alternates derived from the generalized model of (3). Appendix B explains formal matters. Tables 1 and 2 summarize notation convention. Equations numbered beyond (15) belong to the appendices. Appendix D explains corresponding regression protocols.
Fitting results of the TAMA arrangement of (2) appear in Appendix D. Figure 1 shows the spread about TAMA’s linear mean response function . We can visually ascertain that deviations from the linear mean response function suggest curvature (red dots). Thus, data processing removed inconsistent replicates but shown spread still deviates from a linear mean response. This suggests that curvature in geometrical space could not be explained by methodological factors related to data gathering.
Fitting results of the biphasic protocol of (8) are summarized in Appendix D. Figure 2 displays the spread about mean response function in geometrical space. Compared with Figure 1, we can ascertain that the biphasic fit provides a consistent account of different variation patterns among smaller and larger leaves. We can visually ascertain that fit produced consistent results. This confirms a judgement that identified curvature might be due to intrinsic factors of leaf growth rather than methodological influences related to data gathering.
Appendix D presents fitting results of the polynomial model . Figure 3 displays dispersion about the polynomial mean response function in geometrical space . A polynomial representation also exhibits higher consistency than the TAMA arrangement. Recalling the biphasic scheme, the polynomial suggests a smooth transition between two growing phases.
3.2. Model Selection in Geometrical Space
Assessment of models fitted on geometrical space relied on goodness-of-fit statistics, that is, the coefficient of determination, standard error of estimate, mean prediction error, total relative error, average systematic error, and mean percent standard error ([63–65]). Besides, we took into account concordance correlation coefficient  and Akaike’s information index . Table 3 presents results. Goodness-of-fit statistics and and AIC values disfavored the TAMA protocol. On the contrary, comparison indices favored the biphasic model. Moreover, differences among indices but TRE and ASE for this scheme and the polynomial ( are slight. Particularly, the highest AIC is associated with the TAMA protocol ( Therefore, this model bears the less support. The biphasic choice delivered the smallest AIC’s value . Nevertheless, difference in is just barely relative to the polynomial model, since this choice conveyed ( and ). Thus, model confrontation shows that the TAMA protocol is unsuited, thus backing the assertion that whatever model aims to be consistent with the present data, it ought to be nonlinear in geometrical space.
3.3. Retransformation Results
The TAMA protocol was not supported by the model selection criteria. Anyway, for comparison, corresponding retransformation results are included in Appendix D. Related with the TAMA protocol, fitting results of the biphasic model display a relatively improved distribution of residuals about the zero line. Nevertheless, normal Q-Q plot still shows heavier tails than those expected for a normal distribution. And, again both test statistics and p values of an Anderson-Darling test  provide evidence against normality of residuals. This justifies choosing the nonparametric forms , or for compensation of downward bias induced by retransformation of the regression error ([11, 52, 53]). Table 4 displays comparison statistics for the reproducibility strength of the biphasic mean response as shaped by the different forms of
We can learn that agreement between the biphasic mean response and leaf biomass data is best for Figure 4 shows spread of processed leaf biomass values about the biphasic mean response function as shaped by the considered forms of . We can observe that both and overcompensate the bias correction by Moreover results show that as opposed to the TAMA a biphasic protocol along with the form offers consistent proxies of individual leaf biomass. But it is worth mentioning that in spite of the fact that model selection favored the biphasic scheme, examination of the polynomial model output reveals similar predictive strength to the biphasic alternate.
3.4. Assessing Curvature by Direct Nonlinear Regression
As suggested by Mascaro et al. , effects of curvature in geometrical space can be analyzed by means of the direct nonlinear heteroscedastic regression model of (14). In Appendix B, we explain that such a protocol also derives from the generalized bivariate allometric model of (3). Table 5 presents pertinent notation convention. For comparison, we also present results for the associated homoscedastic case.
Fitting results of the heteroscedastic and homoscedastic models appear in Appendix E. We can learn that estimates for the normalization constant and scaling exponent parameters are very similar. Certainly, corresponding 95% confidence intervals display some overlap. As a result, we can expect similar reproducibility features for both models. Table 6 presents comparison statistics.
We can be aware that model assessment backs the heteroscedastic model. But selection here is mainly on qualitative grounds. It actually concerns the ability of the heteroscedastic model to identify an expected dependence of variance in the covariate. Certainly, the reproducibility strengths of both paradigms are equivalent. Indeed, Figure 5 shows that mean response curves and differ just barely.
Results show that as it occurred for models fitted in geometrical space, data cleaning failed to correct a heavy tails problem for the nonlinear fits. This can be ascertained from the normal Q-Q plot of residuals. This strengthens our point on the consideration of a different error structure from the one assumed here. Exploring the effects of error structure in the fitting of models for curvature addressed here will be a matter of further research. Interestingly, both the homoscedastic and heteroscedastic models seem to induce the same reproducibility strengths.
3.5. Model Assessment in Arithmetical Space
The model selection assay in geometrical space summarized in Table 3 favored the biphasic protocol. Correspondingly, statistics in Table 6 support the nonlinear heteroscedastic model. Results of Table 4 endure as required for largest reproducibility of retransformation output. Table 7 allows assessment of these models. We can learn that half the number of comparison indices coincide (, R2, SEE, and MPE). In addition, the biphasic model is favored by AIC, ASE, and MPSE. This sets criterion for selection of curvature in geometric space as a consistent paradigm for the present data. Accordingly, the biphasic model bears adequate.
3.6. Implications for Allometric Proxies of Mean Leaf Biomass in Eelgrass Shoots
We in turn consider allometric proxies for average leaf biomass in eelgrass shoots. In getting these surrogates, we aggregate allometric projections of individual leaf biomass conforming a shoot. For comparison, we consider individual leaf biomass surrogates produced by the different projection methods. Table 8 compares resulting reproducibility strengths.
Results in  stablished that proxies derived from the TAMA protocol are inconsistent with observed values. This endorsed nonlinear regression in the direct scale as a requirement for reliability of allometric projections of mean leaf biomass in eelgrass shoots. But Table 8 shows that a curvature model fitted in geometrical space can offer proxies entailing similar predictive power to a nonlinear regression protocol. Plots in Figure 6 allow getting a glimpse of this assertion.
The customary bivariate allometric model of (1) offers nondestructive surrogates for average leaf biomass in eelgrass shoots . But there are methodological factors that could influence dependability. Views assert that parameter identification based on logarithmic transformations leads to biased projections [20–29]. But other practitioners clung to this approach as meaningful and necessary in allometric examination [30–39]. This going over suggests that surpassing this controversy amounts to considering curvature in geometrical space. For that aim, we proposed the generalized model of (3). Approaches such as direct nonlinear heteroscedastic regression, as well as biphasic and polynomial protocols in geometrical space , became logical resultants from this construct. For present data model selection validated maintenance of the analysis in geometrical space. Nevertheless, at an empirical level, addressed protocols produced allometric projections of individual leaf biomass of correspondent precision. This was also verified for concomitant projections of average leaf biomass in shoots. But, from a qualitative standpoint, the nonlinear regression protocol mainly contributed by identifying expected dependence of the variance on covariate. Moreover. Figure 7(a) depicts manifest differences in mean response trends between the polynomial fit and the nonlinear heteroscedastic model. Nonetheless, those in Figure 7(b) corresponding to this and the biphasic models differ but only barely. Then, a nonlinear regression scheme at best shaped a reasonable approximation of the mean response function resultant from curvature methods.
Differences in patterns of the biphasic and polynomial mean response functions relative to the nonlinear protocol exhibit that clinging to this last paradigm could impair detection of the true allometric relationship. Moreover, relying on direct nonlinear regression impairs identification of heterogeneity in the log transformed response as covariate changes. This further stresses on limitations of this device as a tool for allometric examination ([18, 30]). Oppositely, output of the selected biphasic model shown in Figure 2 suggests differentiation of growth patterns among smaller and larger leaves. Besides, the polynomial mean response in Figure 3 suggests a gradual transition between different growth phases. Thus, as opposed to direct nonlinear regression, a consideration of curvature in geometrical space could elucidate an inherent leaf growth pattern. This strengthens a judgement that the log transformation step, essential to traditional allometric examination, cannot be thrown away without losing relevant information ([33, 37]).
Mascaro et al.  conceived curvature in geometrical space as related to methodological factors of data gathering. But present examination corroborated consistency of curvature models for processed data. This suggests that manifestation of curvature is rather explained by intrinsic factors in leaf growth. Additionally, data processing failed to amend the heavy tails problem detected on Q-Q plots. This indicates departure of residuals from an assumed error structure. As a result, numerical values of the addressed correction factor forms turned out to be different, thus conveying ambiguity in selection of suitable mean response of models fitted in geometrical space. This could entail the only advantage of nonlinear regression method over log-transformation-curvature paradigms. But, also for this analysis method, an inadequate postulation of inherent error structure can lead to significant bias . It seems then reasonable considering that a suitable characterization of error structure could lead to robustness of built allometric proxies, even when they derive from crude data. Steering to an error structure different from what is assumed here is a worthwhile subject of further research.
When dealing with similar data we suggest taking into account recommendations that come up from this examination. First, it is highly advisable to perform a preliminary examination of the spread around the straight line in geometrical space resulting from the model of (1). If further statistical exploration confirms that linearity and assumed error structure are consistent with data, Huxley’s bivariate allometric model could suit. Otherwise, the arrangement of curvature, error structure, and correction factor form such as proposed here could be called into account for the analysis. The use of data cleaning procedures in order to achieve a better fit is controversial . Instead of performing data processing a posteriori, it is highly advisable to rely on standardized data gathering procedures. This will prevent proliferation of inconsistent replicates that could exacerbate a heavy tails problem on Q-Q plots.
Failure to perform both a preliminary exploration of spread of log transformed allometric data and a sound evaluation of model adequacy could impair detecting a possible manifestation of curvature. As a consequence, the output of a traditional analysis method could set biased predictions of observed values. This circumstance could result in dismissal of a log transformation step in the analysis, giving way to contemplation of direct nonlinear regression as the only protocol to acquire reliable parameter estimates . Results of this examination suggest that consideration of curvature in geometrical space as set by the model of (3) could offer dependable allometric proxies of average leaf biomass in eelgrass shoots.
From a general perspective, complexity as encompassed by the model of (3) can stand for curvature as conceived in allometric examination. Particularly, biphasic or polynomial protocols in geometrical space, as well as a direct nonlinear heteroscedastic regression model, derive as particular characterizations of this paradigm. Moreover all statistical models for accurate estimates of relative growth contemplated by Bervian et, al.,  can be also accomodated by the present generalization of the model of simple allometry of Huxley. But empirical convenience on its own does not validate adoption of this paradigm as a general tool. Certainly, the Weierstrass approximation theorem  backs a polynomial regression model as a reasonable identification device for the generalized allometric model expressed in geometrical space. But suitability of retransformation results will sensibly depend on correction factor form. And a mean function resulting from a polynomial fitted in geometrical space will not enable characterization of functions and one to one. Furthermore, complexity of (3) could pose significant difficulties while attempting its identification through direct nonlinear regression methods. Needless to say, biological interpretation of the scaling functions and is also pending. A quest for efficient tools of nondestructive assessment of plant biomass units justifies addressing these examinations in a further research.
A. Data Exploratory Analysis
This appendix presents an exploratory analysis of raw and processed data sets contemplated in this examination.
Table 9 describes the distribution pattern, in terms of quantiles, for a sample of 10412 measurements of eelgrass leaf weights and related areas taken over 13 months conforming the present raw data. The first four columns in the uppermost row label the minimum followed by quantiles of probability 0.1, 0.25, and 0.50. Correspondingly, the fifth column in the first row presents the sample mean followed by quantiles of probability 0.75 and 0.90 before the maximum. Second row presents leaf dry weight values () and third row shows corresponding areas . Third and fourth rows present transformed and values one to one. Similarly, Table 10 shows the variation pattern of the 10023 observations resulting after applying to raw data a mean plus or minus two standard deviations outlier’s removal procedure  designed to remove replicates considered significantly discrepant from the mean response function of a simple allometric model for a leaf dry weight response in terms of a leaf area covariate.
Comparing quantile values for the leaf dry weight () and corresponding area () variables reported in Tables 9 and 10 for crude and processed data respectively, we can ascertain that in spite of removing a large number of discrepant observations by data cleaning, both original and remnant sets show an equivalent distribution pattern. This similarity in distribution before and after data processing of the data can be better perceived in the Q-Q (quantile-quantile) graphs observed in Figure 8 for raw (a) and processed data (b), respectively.
Q-Q plots in Figure 8 compare the distributions (regardless what they are) of the leaf area and associated dry weight variables. The linear relationship observed between the quantiles of both variables underlines that both variables have similar distributions, although with different parameters. This similarity between distributions of both variables is observed for both sets of observations. The great similarity of the fitted straight lines for both sets of observations in Table 11 confirms that the referred distribution pattern does not change after data processing.
Figures 9 and 10 display boxplots for the 13 months’ long sampling scheme. We can learn that, for raw data, from month 2 to month 6, a reduction in the values of both leaf dry weight and linked area occurred. This is perhaps explained by an increase in temperature during those months. Processed data exhibits similar dynamics through time for these variables. Moreover, the overall variation patterns of raw and processed data, throughout the 13 months of sampling, are similar
B. Derivation of Regression Protocols
In this appendix, we first present notation convention for statistics related to the generalized bivariate model of (3). We then explain how the different regression protocols addressed in this examination derive from this paradigm. We also include notation convention for related statics.
B.1. Generalized Bivariate Allometric Model
A transformation and of (3) establishes the generalized regression model in geometrical space given by (4). The term stands for the corresponding mean response function. Using the customary notation convention, we represent through symbol ; that is,Furthermore, back-transformation of (4) leads to the resultThus, is understood as a multiplicative error term. Denoting by the symbol the corresponding mean response function in arithmetical space, from (B.1) and (B.2), we havewith interpreted as a correction factor of bias of retransformation of the regression error .
B.2. Traditional Analysis Method of Allometry (TAMA)
The bivariate allometric model of (1) derives from the model of (3) setting the scaling parameters constant; that is, and . The resultant regression model in geometrical space is given by (2). Corresponding mean response function will be denoted here by means of the symbol . Hence, from (2) we haveRetransformation of (2) leads to the resultAnd linked mean response function in arithmetical space denoted through becomes
B.3. Biphasic Model in Geometrical Space
Here, we explain the result of (8), as well as the notation convention for related statistics. In order to characterize a biphasic form of the model of (3), we introduce fixed values and , so that the covariate takes values in a range . We then conceive as a fixed value of satisfying . And recalling the transformation and , we take as a breaking point for transition between two different phases of the variation of Moreover, in the model of (3), we let and given bywhere is the Heaviside  function defined throughThen, the biphasic form of (3) is formally represented byDenoting by the resulting form of , (5) yieldsThen, (B.7), (B.8), and (B.12) imply thatwhereThis explains (8). Since as given by (3) is assumed to be a continuous function of , this property ought to be satisfied by the biphasic model; that is, we require to set a condition, . This leads to the equationMoreover, as given by (8) can be equivalently represented bywith the continuity condition of (B.15).
Correspondingly, the regression model of (4) in its biphasic form becomesAs we explained in (4), is a residual error term conceived as a random variable distributed according to an unknown distribution having mean and variance set by a function of the covariate ; that is,
The form of the generalized mean response for the biphasic model is denoted through and becomesIt follows from (B.16) that the expression for can be equivalently represented byBack-transformation of (B.17) leads to the resultwhich can be written in the formThen, the mean response function in arithmetical space becomes
B.4. Polynomial Model in Geometrical Space
In this section, we explain how (12) can be derived from the generalized structure of model of (3). We also set the notation convention for related statistics. For these aims, we begin by considering polynomials and given bywhere, for , and stand for coefficients. Now, let and , whereThis way, we obtain a representation of the model of (3) in the formNow, denoting by means of the associated form of , according to (5), we haveFrom (B.23) through (B.28), we obtainRearranging, we ascertain that takes on the polynomial formwhere , andThis explains the result of (12). Correspondingly, a polynomial characterization of the regression model of (4) takes the formwith being a residual error as described in (4). The form of the generalized mean response for the polynomial characterization is denoted through . It becomesBack-transformation of (B.32) leads toAnd the mean response function in arithmetical space becomes
B.5. Nonlinear Regression Models
In this section, we explain how the nonlinear heteroscedastic model of (14) can be also associated with (3). We also explain the notation convention for related statistics. For these aims, we begin by noticing that in (5) can be also written in the formwhere andNow, if and are as defined in (3), then (B.36) turns out to be nonlinear in geometrical space. Moreover, since from (B.36) we havenoticing that, from (4), , then solving for above yieldsThen, we have that the generalized curvature model of (3) can be also written in the formThus, defining(B.39) implieswhich suggests the nonlinear heteroscedastic regression model of (14). That is,which can be considered as a tool to analyze curvature in geometrical space. Particularly, by setting , we can consider the homoscedastic modelwhere in (B.43) and (B.44) the error term is assumed to be normally distributed random variable having zero mean and homogeneous variance; that is, .
It turns out that the mean response function associated with the heteroscedastic model becomesSimilarly, the mean response function associated with the homoscedastic model becomesFinally, by setting and in (3), a log transformation and leads to the nonlinear regression model in geometrical space:with the error term assumed to be normally distributed with zero mean and constant variance . Then, back-transformation of (B.47) to arithmetical space yieldsThe mean response function in arithmetical space becomesPackard  asserts that, commonly, any data set on arithmetical scale that is consistently described by a three-parameter power function will track a curved path when transformed to the logarithmic scale. Equations (B.45) through (B.47) provide a formal set-up accommodating such a statement.
C. Forms of the Correction Factor of Bias of Retransformation of the Regression Error
In this appendix, we explain the different characterizations of , defined as a correction factor for bias of retransformation of the regression error introduced by (B.3) for .
If is normally distributed with zero mean and constant variance , that is , resulting form of , denoted here by means of the symbol , is given by will be referred to as the Baskerville  form of
In case of a nonnormally distributed residual error term , Newman  asserts that must take the form provided by Duan’s smearing estimate of bias . Here, this form is correspondingly represented by means of the symbol , and it is calculated by means ofwith standing for the residual of the contemplated regression model. Nonetheless, this form of could produce bias overcompensation ([53, 55]). Moreover, corresponds to the sample mean of retransformed residuals. Therefore, whenever outliers occur, the actual central tendency of retransformed residuals data could not be represented by . Under such a circumstance, a suitable form of the correction factor seems indefinite. Moreover, Zeng and Tang  suggest a distribution-free form of represented here by means of and given byWe notice that is actually an approximation to as it corresponds to a three terms’ partial sum of the power series expression of assuming By the same token, we can consider an alternative approximation for ; this is represented through the symbol and given by an -terms partial sum of the series representation of ; that is,The value of leading to the highest concordance correlation coefficient value between projections and observed values sets the explicit form of in (C.4).
D. Fitting Results of Regresion Models in Geometrical Space
This appendix presents fitting results of the geometrical space protocols derived from the generalized model of (3). This includes TAMA and biphasic and polynomial schemes.
D.1. Fitting Results of the TAMA Protocol
Since we are dealing with bivariate data (), in arithmetical scale, according to the TAMA protocol, we have to consider pairs (), where and for Moreover, the linear regression model of (2) becomeswhere the error term is assumed to be normally distributed with zero mean and constant variance . Hence, the response has normal distribution with constant variance and a mean expressed as a function of the covariate ; namely,Identification method contemplated here relies on finding parameter estimates that maximize the log-likelihood function This is given byFitting results of the linear model of (D.1) through (D.3) to raw data appear in Table 12.
Figure 11(a) displays a biased distribution of residuals around the zero line. The normal Q-Q plot in Figure 11(b) can be considered as a visual test of goodness of fit. In this case, this provides primary evidence against normality of residuals. Indeed, we can learn of a heavy-tails pattern in this plot. Moreover, an Anderson-Darling  goodness-of-fit test resulted in a test statistic of 246.7 and in a p value of <2.2e-16, which confirms lack of normality of residuals.