Abstract

Morphometric analyses have great potential for application in fruit crops, especially in the construction of indices that can be linked to biophysical and/or biochemical quantities of a physiological nature. For example, in peaches, it is convenient to establish quality attributes for harvest or postharvest, where usually the sigmoidal or double sigmoidal models describe the growth of some indicators. The nonlinear nature of this and other associated models sometimes makes it difficult to construct approximate growth rates, so instantaneous rates are used instead. The calculation of approximate rates in nonlinear models may be inappropriate due to aspects related to the phrase known as the “average fallacy.” In this research, different classification algorithms are applied to select the approximately linear phase present in various nonlinear models of variables or parameters used in the modeling of the growth of a crop. A 3D line model was fitted in the extracted section using the decomposition of singular values to generate a simple form of the growth rate. The application was illustrated with growth data of the equatorial and longitudinal diameters of peach fruits measured on different days after defoliation, using data from different elevations above sea level. The proposal simplifies obtaining some growth rates using nonconventional methods; in addition, it allows the comparison and adjustment of the model for the different elevations considered, which provides a novel way for the teaching of certain areas of applied mathematics in plant physiology.

1. Introduction

The peach (Prunus persica) is a cold weather species that is well adapted to climates in temperate and subtemperate areas with cool winters and hot summers [1]. Its fruits are among the 10 most produced worldwide [2]. Peach cultivation is currently spreading to nontraditional areas in subtropical and tropical regions, where the climate differs from the peaches’ natural habitat [3, 4]. In 2019, 30,232 tons of peaches were produced in the Colombian highlands with a yield of 9.96 tons/ha. In these nontraditional areas, peach growth behavior is diverse, and so research continues to be generated to describe it. In the province of Pamplona, Norte de Santander, peach cultivation using varieties with low cold requirements [5] is an important economic activity since it is a source of income for producers of the region [6].

The growth of the fruit of the peach tree has been described as a double sigmoid [7]; however, Bruna and Moreto [8] indicate that this statement is not consistent because there are different varieties in the world that differ in terms of their phenological cycle of fruits and harvest due to genetic and environmental conditions. This highlights the presence of short-cycle materials that can be ultra-early that follow sigmoid behavior, while cultivars with an intermediate or medium and long or late and very late cycles follow a double sigmoid behavior [9]. These differences are inherent to the region and management of the crop, which makes it necessary to develop specific models for the study of fruit growth for agronomic decision making [10].

Measurements of the equatorial and/or longitudinal diameter have been used in the characterization of the behavior of peach fruit growth, with regression models being the most prominent [11, 12]; however, these measurements are applicable to many other fruits and organs with similar shapes where their growth is usually fitted with nonlinear models.

In plant physiology, the growth of plants and their organs has been of great interest due to their heterogeneous nature and the importance of associated structural changes such as performance indicators [13], disease tolerance [14], and inherent crop management. From this interest, various proposals have been generated to model growth and predict its performance, either by directly measuring changes in growth or by using indices associated with these changes. Huxley [15], in his book “Problems of Relative Growth,” stated that growth could be expressed in a simple and quantitative way through empirical formulas that relate the facts observed in the field. Later, Jain et al. [16] described the use of growth indices to predict yield using biometric characteristics recorded on various crops and at different growth stages.

Usually, to construct the associated growth indices, it is stated that under linear growth, the first derivatives can be approximated by difference ratios (Table 1) in the mean value column [17]. Watson [18] observed that in short intervals of time (1-2 weeks), the approximation of the growth rate is satisfactory, and the errors introduced are negligible for field crops, while the sampling intervals with shorter lengths can generate biases by increasing the departure from linearity. More exhaustively, Radford [19] presented the use and abuse of different physiological indices together with the necessary conditions to apply and interpret them properly. However, despite this information, in many cases, these considerations continue to be ignored, thus omitting the reality associated with the fallacy of averages in nonlinear behaviors [20]. In addition, the start and end dates of the sampling can have a great effect on the adjusted model since it only describes a stretch of growth where the characteristic phases of some known models are not detailed; however, still approximately linear sections are perceived where an approximate rate might be useful from a physiological point of view. For this reason, the objective of the current research was to propose a mathematical and computational strategy to extract linear sections in nonlinear or linear models with nonzero curvature, usually associated to the highest growth rate or time intervals of constant rate, in order to do modeling by fitting 3D lines and thus propose a different way to obtain the growth rates. The results made it possible to compare the curves, calculate the distance between the curves, and estimate an adjustment measure analogous to the root of the mean square of the error (but with orthogonal distances for the comparison of the models for different altitudinal levels) [21]. The model can be easily extended to any other crop in both linear and nonlinear models in the parameters.

2. Methods and Applications

2.1. Characterization of the Study Area

The Jarillo variety (Prunus persica) of this study has the following agronomic characteristics: it has yellow skin and pulp, small size, rounded shape, and high prolific production, and it is susceptible to postharvest handling. The plants are large due to a planting distance of 6–7 m between plants and furrows that are distributed according to homogeneous conditions of slope and agronomic management, forming plots with peach trees in full agronomic production from sexual seed and trees aged 11 years. The plantations used in the research were within the physiographic zone of the Santander mountains that included the middle zones of the Eastern Cordillera of the Andes in the department of Norte de Santander, between 1670 and 2270 meters above sea level (Figure 1). In Pamplona, the “Delicias” farm, Chíchira village (7°22′ 43.6″N, 72°37′41.1″W), had an altitude of 2170 m.a.s.l., an average temperature of 16°C, and rainfall of 933.9 mm with a bimodal regime. In Chitagá, “Recuerdo” farm, Carrillo village (7°11′15″N, 72°39′7.3″W), had an altitude of 1870 m.a.s.l. with an average temperature of 18°C, precipitation of 879.5 mm, regimen unimodal, and in Pamplonita, “Bella Vista” farm, Batagá village (7°26′18.1″N, 72°38′9″W), an altitude of 1670 m.a.s.l., average temperature 20°C with 1200 mm of rainfall in a bimodal regime.

2.2. Plant Material Attributes and Spatial Sampling

Sampling was established by locality on 11-year-old trees, planted with sexual seed, which were pruned after observing their architecture, phytosanitary status, and location of productive branches. Pruning was done on mixed branches to balance the architecture of the tree. Thinning is done before the fruit hardens and after the natural fall of the fruit. In Pamplonita (1670 m.a.s.l.), it is done 60 days after defoliation, and in Pamplona (2170 m.a.s.l.) and Chitagá (1870 m.a.s.l.), it is done at 70 days. Each of these regions is associated with each of the elevations studied. Pruning and thinning as agronomic activities usually improve fruit size and quality [22]. During the productive phases, weekly observations were made on three productive branches. The nonextended BBCH scale of 10 developmental stages was used for phenological identification of growth stages [23].

To form the data matrix, the variables, equatorial diameter (), and longitudinal diameter () of the fruit measured in cm were available. A record of around 15 days was kept of the fruits that began to be measured around 32 days after defoliation (τ). Diameter measurements were collected with the Ubermann electronic vernier caliper (Fa. Sodimac, Santiago, Chile). Sampling of variables was carried out for each altitude level using a Latin hypercube sampling [24] using the clhs library of R software (4.2.1), generating 13 trees per thermal level from which three fruits were taken from the middle third of the tree. Data to illustrate the modeling methodology corresponded to the years 2009 and 2010 that were combined because they presented a similar growth pattern defined by a sampling time that does not allow a clear differentiation between the sigmoidal or double sigmoidal model, but this one does not show the pattern that might be desirable to calculate a growth rate that can be separated to illustrate the proposed methodology.

2.3. Approximately Linear Phase Selection

For the election of the phase of interest, in principle, a three-dimensional graph involving diameters and time was constructed; later two-dimensional projections of date against diameter were made to achieve a first approach to the dates where the phase of interest was observed (approximately linear). Next, several methods of cluster analysis of morphometric measurements (Ward, Clara, K-means, fuzzy K-means, and model-based) and the plot based on the intracluster sums of squares were used to define the number of groups that would guarantee the best grouping and the smallest sums of average orthogonal quadratic distances. The best classification was used as a criterion to delimit the phase of interest but could be used for any other phase where a 3D line fit might be appropriate for modeling.

2.4. Growth Rates

Models are usually required in physiology to study their growth rates to compare varieties according to these metrics or to study nutritional requirements or carry out agronomic management according to development phase; however, a usual form continues to appear in the literature to obtain the growth rates [25] that imply absolute or relative changes in growth depending on who provided the data, the plant, or the crop, as illustrated in Table 1.

As the rates labeled as CRT and HAT are obtained from the CT, all the work is illustrated with this first rate, so this other pair of rates is obtained with a simple additional calculation.

2.5. Modeling Using 3D Lines

An approximately linear phase point cloud display was generated, and 3D lines were fitted for each altitude, yielding a simple model written in parametric form. One way to determine the equation of a 3D line is from a point in R3 and a normal direction vector [26]. Usually, the straight lines in the plane are fitted using a least-squares estimation; however, in the current case, an orthogonal distance regression line in R3 was used that was estimated by one of the applications of the singular value decomposition (SVD). A simple description of the algorithm involves a sequence of five steps: (1) input of the point cloud, (2) calculations of the centroid of this cloud, (3) calculations of the deviations of each observation with respect to the centroid, (4) generation of the DVS from the previous deviations, and (5) establishment of the direction of the vector that is associated with the largest singular value, since the desired line is simply the output of the centroid and the direction [27].

Describing the algorithm, a bit more, suppose you want to find the line that is closest to a set of n three-dimensional points; and the proximity is measured by the squared sum of the orthogonal distances between the line and the points. Let the position of the line be represented by a point c (centroid) belonging to the line and let the unit vector be the normal to the line that determines its direction. The orthogonal distance between a point pi and the line is then (pic) Tu (where T represents the transpose operator), and thus the line can be found by solving

Solving for gives  = 1/n. Let Di=pi- be the i-th row of the matrix D associated with the deviations. The minimization problem can be rewritten as

Now using the DVS of D, we have that D = USV′; in this way, . The algorithm can be described as follows:(1)Calculation of .(2)Calculation of matrix D.(3)Obtaining the DVS of D=USV′.(4)Obtaining from (third column of : normal vector).

For practical purposes, the equation was converted into a symmetric parameterized form to demonstrate the process of obtaining the TAC (Table 1) and other aspects related to the comparison between lines, shortest distances, and the fit measure. Modeling was performed for each altitude level using the scikit spatial 6.2.0 Python library. Finally, d (2) was obtained for each altitude, formulating an expression in terms of the components of the vector [28].

3. Results and Discussion

In several studies, models relating to the behavior of the longitudinal and equatorial diameters of peaches have been proposed [7, 11]. These studies used various nonlinear models to study the diametral behavior in peaches. However, these functional forms usually had some difficulty in certain users in obtaining instantaneous rates or when their approximate form was used that might not be convenient at all stages of growth. Therefore, we used a 3D visualization (when three-dimensional is the case but can be applied to lines in higher-dimensional spaces by involving more morphometric variables), allowing the point cloud to be separated into similar regions in terms of rates. For example, in Figure 2, three phases were identified, one of which was the one with approximate constant growth (very low curvature); however, from the point cloud, it was not easy to differentiate this section, so classification algorithms were used for their separation. Clearly, the sigmoidal model would have been the linear growth phase or the one known as the fruit filling phase, and that could vary according to variety, climate, and other aspects such as the position of the fruit [29]. Figure 3 describes the point cloud of the three variables involved. Each color represents a different altitude. A diffuse pattern is highlighted, probably attributable to the sampling time, but that finally has an approximately linear section with an approximately constant rate that can be used in an illustrative way to properly apply the calculation of growth rates in similar sections. These fall into measures that, in the presence of nonlinearity, distort the values of these rates or that do not represent the changes in the growth rate in the time interval.

For the selection of the central section, various classification algorithms were used, initially specifying three groups for the R program (also suggested by the intragroup sum of squares graph) except for the method based on the model that suggested four groups. The groups were generated without discrimination by altitude. Once selected, they were marked with a different color as shown in Figure 3 for each of the methods.

With these classifications, the central section (Table 2) was extracted that was assumed to belong to the group of the same pattern, in this case the approximately linear one. In each algorithm, the number of members and the time interval were generated by truncating 5% because some algorithms overlap some observations between groups. It was not only similar in the number of members of the extracted section, but also in the time range of each section [30].

The fit of the line in R3 was determined by the point c =  -centroid on the line and in the direction of u parallel to the line in the sense that if the vector begins at some point on the line, then the entire vector lies on the line. The set of points is given by the vector equationwith . Making , we can write the separate expression as

From equation (4) we obtained a simple expression for the growth rates associated with equatorial diameter and the longitudinal diameterwith the application of the chain rule as follows

So, it is enough to obtain the vector u =  to know the TAC, which corresponds to the first eigenvector of the DVS carried out with the Python program that allowed calculation of the vectors corresponding to each elevation with its respective centroid. Table 3 shows the results of the adjustment by applying the algorithm.

The growth rates (TAC) by elevation and diameter after applying (5) can be seen in Table 4.

The results show similar growth rates between the two highest altitudes with respect to 1670 m.a.s.l., with an equatorial diameter greater than the longitudinal one, resembling a little more than a flattened fruit [31].

The point diagrams and the 3D lines for each altitudinal floor (elevation) are shown in Figure 4, separating them by the type of agglomeration algorithm. The diagrams show the obliquity of the 3D lines; in fact, it can be shown in this case that the lines are not parallel because when dividing the components of u element by element for each elevation, a constant was not obtained, so they did not have the same growth rate for any altitude. This can be attributed to a different accumulation of photo-assimilates, soluble solids, or malic acid accumulation [32].

For the demonstration of nonintersection between curves, the curves of the table were equalized, obtaining the parameters of the two curves. When substituting them in the third, the equality was not satisfied, and, in this way, obliquity was demonstrated as shown in Figure 4. In addition, the straight lines did not intersect at any point in the section. With this result, it was possible to obtain the times where the distance between the curves was minimal, allowing calculation of the time () where the diameters were more similar at each altitude. Following the optimization procedure of Srinivasan [21], the functions F (t, s) = d2(P1, Q1), F (t,) = d2(P2, Q2), and F(s,) = d2(P3,Q3) are the squared distances of the point P belonging to the first line and Q belonging to the second for the three altitudes (i = 1,2,3). Writing in a general way the three straight line equations, we havewith by the singular value decomposition. Illustrating with the calculation of the shortest L1-L2 distance by , y, with , the expression was generated:

Similarly, it was possible to proceed for the other two distances. The critical points were obtained by satisfying

Solving for ,

Now solving for ,

Using the established conventions, (9) and (10) can be written in vectorial form.obtained fromwith as an identity matrix of dimension 3 × 3. Writing equation () in the same vector way gives

In an analogous way, we can obtain F (t,) and F (s,). To obtain the distances, a Python function was created to calculate the minimum distance. The R and/or Python functions appear on GitHub to perform the necessary calculations (Table 5).

A final calculation was the generation of an adjustment measure that allowed comparison between altitudes. With (2), it was possible to obtain the sum of all orthogonal distances and a similar measure to the root mean square error was generated. However, with orthogonal distances, we averaged the sum of all quadratic distances that are associated with the residuals of the model and obtained from (2) the calculation of for each grouping method and for each elevation as broken down in Table 6.

The best fit was obtained from the model-based algorithm [33], with the highest altitude data providing the best linear fit but with no great difference between altitudes.

4. Conclusions

This research is a contribution to the morphological characterization of a crop illustrated in the case of peach, which was useful in nonlinear growth models with sections where approximately linear sections were perceived, which could be discriminated by different cluster analysis algorithms.

The novel and simple proposal to obtain the absolute growth rate (TAC) and other indicators of the lines in R3 from one of the applications of the singular value decomposition in the minimization of orthogonal distances provides readers not only an application of methods of differential calculus, optimization, and algebra in the teaching of applied mathematics but also provides a different view of modeling physiological indices that can avoid errors with the use of approximate rates in nonlinear growth.

The methodology allowed for estimating approximately linear growth phases, in the case of TAC, began at 79 and ended at 157 after defoliation.

From the 3D lines, a new and simple way of obtaining growth rates was proposed, which in the case of the current research allowed us to demonstrate how the highest equatorial growth rate was found at the two highest elevations.

The methodology can be extended to different linear or nonlinear models where not only an approximately linear section can be perceived as for line fitting inR3to obtain growth rates using different indicators.

Data Availability

The data are available in the repository of Carlos Rivera and can be found at https://github.com/CarlosRivera1212/PEACH_LINES.

Conflicts of Interest

The authors declare that they have no conflicts of interest.