#### Abstract

Six generalized algebraic difference equations (GADAs) derived from the base models of log-logistic, Bertalanffy-Richards, and Lundqvist-Korf were used to develop site index model for *Pinus pinea* plantations in north-west of Tunisia. To assure the base-age invariance of the model parameter estimates, a dummy variable approach was used. Data from stem analysis, corrected with Carmean's method, were used for modelling. To take into account the inherent autocorrelation of the longitudinal data, a second-order continuous-time autoregressive error structure was used, which allows the models to be applied to irregularly spaced, unbalanced data. Both a qualitative analysis based on the biological realism of the models and numerical and graphical analyses based on the accuracy of the models as well were used to evaluate the performance of candidate models. The relative error in site index predictions was used to select 30 years as the best reference age. Based on the analysis, a generalized algebraic difference equation (GADA) derived from the base model of Lundqvist-Korf realized the best compromise between biological and statistical constraints, producing the most adequate site index curves. It is a polymorphic model with site-dependent asymptotes. This model is therefore recommended for height growth prediction and site classification of *Pinus pinea* plantations in north-west of Tunisia.

#### 1. Introduction

In Tunisia, *Pinus pinea *planted forests, with an area of approximately 35 000 hectares, provide key productive, protective, and social functions for the welfare of rural areas. *Pinus pinea* has been used in Tunisia in afforestation programmes since 1907, when they were used to stabilize the littoral dunes of Bizerte (north of Tunisia) and then in 1930 to stabilize the littoral dunes along the north-east coasts in the region of the Cap Bon [3]. The success of these first plantations encouraged Tunisian foresters to use this species to afforest the littoral dunes of the northwest of the country as well as the degraded forests of cork oak of the northwest. The distribution of *Pinus pinea* forests in Tunisia includes the wet and subwet bioclimates, where it is well acclimatized to the local conditions despite of the diversity of the pedo-climatic conditions [4].

*Pinus pinea* wood is employed in Tunisia for several purposes, for example, particle board, pallets, construction of boats, poles, or mine poles. It is also employed in paper mill for the production of cellulose and mechanic pulp [3]. In addition, *Pinus pinea* produces one of the most appreciated nonwood forest products: pine kernels.

Although *Pinus pinea* has become, after *Pinus halepensis,* the most used species in afforestation programs in Tunisia, a growth and yield model that can provide reliable predictions of *Pinus pinea* forest stand development under different management schedules is still missing. One of the first steps in developing a system of growth equations is the construction of a reliable site index model to predict the site quality of forest stands.

Site index models based on the height growth of dominant trees are the classical way of indirectly estimating site quality (mostly a combination of soil fertility and climate) in forestry management (e.g., [5–7]). Among the three general methods for site index curve construction (the *guide curves method*, the *parameter prediction method,* and the *difference equation method*), the algebraic difference approach (ADA) presents the advantages [8, 9]: the structure of equations is base-age invariant. The generalized algebraic difference approach (GADA) improves the traditional algebraic difference approach (ADA) allowing more flexible dynamic equations which can be polymorphic and with multiple asymptotes [10]. The objective of this study was to develop a site index model for *Pinus pinea* plantations in the northwest of Tunisia using the Generalized Algebraic Difference Approach (GADA) proposed by Cieszewski and Bailey [6], using dummy variables [11] to fit the growth curves obtained from stem analysis of mean dominant trees.

#### 2. Material and Methods

##### 2.1. Study Area and Data

The study concerned all the* Pinus pinea* plantations in the northwest of Tunisia. These stands, covering about 3600 ha [12], include the reafforestation area extending from Sedjane to Ain Draham mountains at 843 m of altitude. The bioclimate is the humid of tempered variant in altitude and of hot one in the coasts. The annual precipitation depends on the altitude (from 850 mm in the coasts to 1560 mm at Ain Draham). The mean annual temperatures vary from 15.6 to 16.1°C. The maximum mean temperatures of the hottest month, may, reach 31°C and the minimum ones of the coldest month attain 3.9–8.1°C. To refer to the USA classification, soils of the study area are dominated by Psamment groups in the sandy coastal dunes and by Palexeralfs (brown forest soils) in mountain [13]. These are deep soils with high water retention.

Data come from 62 temporary sample plots located in the planted *Pinus pinea* forests in northwest of Tunisia (Figure 1). The plots were of circular form with a surface of 0.04 ha. Before plot installation careful prospecting was done with the objective of identifying main *Pinus pinea* forest types. Plots were established covering a wide range of ages, stand densities, and site qualities. Data from stem analyses were used in the study. A dominant tree, with diameter close to the mean diameter of the 100 largest-diameter trees per hectare [14], in each plot was cut down to 0.30 m and its total height was measured. Each tree was then cut at 1 m intervals until the diameter was 7 cm. The number of rings was counted at each cross sectioned point and then converted to stump age. The current age of a tree was estimated by adding to the number of counted rings at 0.30 m of the ground, three years. This number corresponds to the number of years necessary for the tree to reach a height of 0.30 m [15]. The age at a given level of the stem corresponds to the difference between the current age and the number of rings counted to this level [16, 17]. As cross-section lengths usually do not coincide with periodic height growth, we used the Carmean’s method [18] and the modification proposed by Newberry [19] to account for such bias. This was based on earlier studies [20, 21]. These corrections remove the bias when we assume that the height of the section is the maximum height attained at a given age.

Summary statistics of the modelling data can be seen in Table 1.

##### 2.2. Candidate Functions

Six dynamic equations with two site-specific parameters were tested using a dummy variable procedure [11]. These equations (hereafter referred as M1, M2, M3, M4, M5, and M6) are derived from the base models of log-logistic, Bertalanffy-Richards, and Lundqvist-Korf using the generalized algebraic difference approach (GADA) proposed by Cieszewski and Bailey [6] and Cieszewski [22]. These three tree-parameter base growth functions are commonly used for the development of site index curves and dominant-height growth modelling [23–32]. The two subsequent models (M1 and M2) may be viewed as special cases of the log-logistic class of models, which are equivalent to Hossfeld models [33]. The three following models tested (M3, M4, and M5) were based on the function proposed by von Bertalanffy [34] and studied by Richards [35]. The last model M6 was derived from the base function proposed by Lundqvist [36].

Table 2 presents the equations analysed (M1–M6). As a general notation convention, were used to denote parameters in base models, while were used for global parameters in subsequent GADA formulations. The dynamic equations have the general form where is the value of the function at age and is the reference variable defined as the value of the function at age .

##### 2.3. Model Fitting and Statistical Analysis

The methodology employed in this work is based on the generalized algebraic differences equations (GADAs) suggested by Cieszewski and Bailey [6] with the use of the fictitious variable method (dummy approach) to fit the equations [11].

For the GADA, it is generally needed that two parameters (the parameter related to the asymptote and one among the shape parameters) are functions of site productivity to allow simultaneously polymorphism and variable asymptotes for growth curves. Productivity is assumed to be dependent on an unobservable variable which can neither be measured nor defined [6, 10, 22]. The unobservable theoretical variable which represents the site productivity dimension is an unknown function of management regimes, soil conditions, and ecological and climatic factors, which cannot be reliably measured or even functionally defined [10]. While GADA assures base-age invariance of the derived algebraic forms, the base-age invariance of the model parameter estimates was assured by fitting them using the dummy variable approach [11, 37].

To illustrate this method, consider the model M3 (Table 2). The variable must be substituted by a sum of terms containing a site-specific parameter (an initial height) and a dummy variable for each tree: where is a dummy variable equal to 1 for tree and 0 otherwise. The first sum of terms collapses into a single parameter (an estimated height at the specified initial age ) unique for each tree during the fitting process.

The use of stem analysis data implies the autocorrelation among observations within the same tree (correlation between the residuals within the same tree), which invalidates the standard hypothesis testing [38]. Therefore, to account for this possible autocorrelation the error terms were modelled using a continuous-time autoregressive error structure (CAR()). This allows accounting for irregularly spaced, unbalanced data [38, 39], typical for many forestry data sets [40]. The CAR() expands the error terms in the following way [39]:
where is the th ordinary residual on the th tree (i.e., the difference between the observed and the estimated heights of tree at age measurements *j*), for and it is zero for is the -order autoregressive parameter to be estimated, and is the time distance (years) separating the th from the th- observations.

To evaluate the presence of autocorrelation and the order of the CAR() to be used, graphs representing residuals versus lag-residuals from previous observations within each tree were examined visually. The dummy variables method and the CAR() error structure were implemented by use of the SAS/ETS MODEL procedure [41], which allows for dynamic updating of the residuals.

To determine the order of the function CAR() in the growth model to be used to do the correction for the autocorrelations, we first fitted all the six studied models using nonlinear least squares without expanding the error terms to account for autocorrelation. It appeared that a trend in residuals as a function of height lag-residuals within the same tree was evident in all models, as expected because of the longitudinal nature of the data used for model fitting. Figure 2(a) provides an example with the M6.

**(a)**

**(b)**

**(c)**

##### 2.4. Model Comparison

The comparison of the estimates for the different models was based on numerical and graphical analyses of the residuals. Also, four statistical criteria obtained from the residuals were examined: root mean square error (RMSE), mean residual (Bias), adjusted coefficient of determination , and Akaike’s information criterion differences (AICd) [42]. The expressions of these statistics are summarized as follows.

Residual mean square error:

Mean residual (Bias):

Adjusted coefficient of determination: where , and , and are the measured, predicted, and average values of the dependent variable, respectively; is the total number of observations used to fit the model, and is the number of model parameters.

Akaike’s information criterion differences (AICd), which is an index to select the best model based on minimising the Kullback-Leibler distance, were used in order to compare models with a different number of parameters:
where and is the estimator of the error variance of the model obtained as follows:
According to Cañadas [43], the longevity of *Pinus pinea* is around 200–250 years. We used the dominant height predicted at 250 years for the best site quality as an additional comparison criterion with the other statistical criteria employed to compare the studied models.

Graphical analysis of the residuals and the biological realism of the fitted curves as well were an important step in comparing the different models. This is essential because curve profiles may differ drastically, even though the statistics and residuals of the fit are similar (e.g., [44]).

##### 2.5. Selection of Reference Age for Site Quality Evaluation

The practical use of the model to estimate site quality from any given pair of height and age data requires the selection of a base age to which site index will be referenced. Inversely, site index and its associated base age may be used to estimate dominant height at any desired age. Therefore, the selection of a base age becomes an important issue when only one observation of a new individual is available [26].

According to Goelz and Burk [7], the reference age should be selected taking into account three considerations: (1) the reference age should be less than or equal to the younger rotation age for common silvicultural treatments; (2) the base age should be close to the rotation age, and (3) the base age could be selected so that it is a reliable predictor of height at other ages. Diéguez-Aranda et al. [26] consider that the reference age could be selected as young as possible, in order to help in earlier decision making of the silvicultural treatments to be applied to the stand. Different base ages and their corresponding observed heights were used to estimate heights at other ages for each tree. The estimated heights were compared with the observed heights from stem analysis. The relative error in predictions (RE%) was then calculated as follows: where , , and are the measured, predicted, and average values of the dependent variable, respectively; is the total number of observations used to fit the model, and is the number of model parameters.

It is important to note that the statistic described in this section is only meaningful if the site-specific parameters are discarded because of the lack of repeated data within the same individual [26]. Otherwise, the site-specific parameters could be estimated and all the predictions would be identical whatever base age was selected.

#### 3. Results

Figure 2(c) shows the trends in residuals after the correction for autocorrelation was done using a second-order continuous-time autoregressive error structure CAR(2).

The parameter estimates for each model and their corresponding goodness-of-fit statistics are shown in Table 3. The two models M1 and M3 each present a nonsignificant parameter ( for M1 and for M3). For the other four models, all the parameters were found to be significant at 0.0001 level (including the site-specific parameters for each tree) except parameter in model M2 who is significant at only 2% level.

Among the 4 models for which all the coefficients are significant, M2 and M6 were the best models in terms of RMSE, and AICd with a slight superiority of model M6 (Table 3). In addition, by looking at the estimated values of dominant height at 250 years old (Table 3 and Figure 3), M4 and M5 show a low value (24 m) for the best site quality class which can be exceeded in the case of old stands and good fertility. So, the most interesting models are M2 (Hossfeld) and M6 (Lundqvist-Korf).

To compare M2 and M6, we used graphs showing(i)bias and root mean square error in height estimation by 5-year age classes (Figure 4),(ii)fitted height growth curves for different site index values overlaying the trajectories of observed height and current annual height growth curves observed for both models (Figure 5).

**(a)**

**(b)**

**(a)**

**(b)**

Considering the reduced number of observations (3 observations) whose age exceeds 47 years, the last observations were grouped in the 45-year old age class. Concerning the bias and RMSE in height predictions by age classes, Figure 4 shows that M6 produces lower values than M2. On another side, according to Figure 5(a), the height growth curves for M2 and M6 are quite similar until 40–45 years old, thereafter growth declines for M2. Regarding the asymptotes obtained with the two models, the asymptote around 37 m given by model M6 is biologically more probable than that of M2 (about 25.9 m). Finally, in accordance with Gadow et al. [45], the maximum height growth is reached at different ages depending on the site index. The height growth culminates earlier in better sites, but also declines more quickly after reaching the maximum [14]. In addition, from the experience and field observations, it is probable that the maximum growth in dominant height of *Pinus pinea* occurs in younger ages, around 4–6 years old [2]. With model M6 the maximum annual height growth takes place between 7 and 10 years old according to the site quality (Figure 5(b)), while with model M2 the maximum annual height growth takes place between 11 and 13 years of age according to the site quality. Therefore, at young ages, model M6 seems to adapt slightly better to the reality than model M2. Model M6 which explains 98.3% of the total variance and presents good fitting statistics (Table 3) provides also a random pattern of residuals around zero with homogeneous variance and no detectable significant trends for predicted heights and site index at a reference age of 30 years (Figure 6). Since site index is a fixed stand attribute which should be stable over time, a plot of the site index predictions against total age using M6 and of the stem analysis data was performed (Figure 7). The graph reveals the consistency of site index predictions over time except for the youngest ages (below 15 years).

**(a)**

**(b)**

Model M6 (the GADA formulation derived from the Lundqvist-Korf base function by considering two parameters as related to site productivity) was selected for height growth prediction of *Pinus pinea* plantations in northwest Tunisia:
where is the predicted height (meters) at age (years),
and and represent the predictor variables height () and age () at which is observed. Note that in (10) the site-specific parameters are discarded in a way similar to that used to discard the autocorrelation coefficients [46], because the general use of the model involves making predictions using observed height and its associated measurement age in new individuals [26].

#### 4. Discussion

We were aware of the restrictions of our data, having only two plots with measurements over the age of fifty years. The lack of permanent sample plots measured at young and old ages was also an important restriction. Therefore, model selection was viewed as a compromise between biological and statistical considerations, rather than as a pure exercise in statistical inference. The model selection procedure indicated that (10) had the most desirable characteristics that a site index equations should have, that is, it constitutes a parsimonious, dynamic site equation; it shows no trends in residuals; it predicts height equal to site index at base age, and, although it is not defined for age zero (according to Diéguez-Aranda et al. [27], this is not a fundamental property), the height limit when age tends to zero is zero; it is polymorphic with sigmoidal curve shapes and varying asymptotes; and it has the base-age and path invariance properties providing consistent predictions.

To determine the best reference age to define the site index of stands, we used model M6. Figure 8 suggests the selection of a reference age in the interval 30–40 years to define site index (lowest relative errors). However, after 30 years the number of observations decreases significantly. Despite that, according to Diéguez-Aranda et al. [26] who consider that the reference age should be selected as young as possible, and considering that a compromise needs to be found between the number of observations used in the estimation phase (too low after 30 years) and the relative error obtained, we conclude that a reference age of 30 years seems appropriate for *Pinus pinea *plantations in northwest of Tunisia.

Site index was then defined as the dominant height reached at 30 years old. According to the predicted values of this parameter and their distribution presented in Table 4, four quality classes were defined around the average (~10.50 m), with a 3 m step between each class.

Finally, Figure 9 presents the curves derived from model M6 (Lundqvist-Korf) for site index values of 6, 9, 12, and 15 m at reference age of 30 years overlaid on the trajectories of the observed heights over time. The values of the awaited heights at 100 years of age are 13.00, 18.37, 23.57, and 28.65 m, for site index values 6, 9, 12, and 15 m, respectively, which seems realistic. The curves should be used for ages greater than 15 years (Figure 7), since for younger ages erratic height growth may lead to erroneous classifications.

In the absence of other existing models for *Pinus pinea* in Tunisia, we will compare our results with those obtained in Spain by Calama et al. [1] and Piqué [2]. Both, Calama et al. [1] for different regions in Spain and Piqué [2] for Catalonia, using the Bailey-Clutter Model adjusted with algebraic difference approach (ADA) according to the methodology of Goelz and Burk [7], defined also 4 classes of site quality with mean heights reaching 21, 17, 13, and 9 m, respectively, at 100 years of age. Figure 10 presents the trajectories of the observed dominant heights over time for *Pinus pinea* in northwest of Tunisia superposed with site index curves for the three models: Lundqvist-Korf adjusted with GADA approach for site index curves for *Pinus pinea* plantations in northwest of Tunisia, Bailey-Clutter adjusted with ADA approach for national site index curves for *Pinus pinea* stands in Spain [1], and site index curves for *Pinus pinea* stands in Catalonia [2]. This graph shows that the site index curves obtained in northwest of Tunisia are more accentuated than those obtained in Spain. At the younger ages, the site index curves of northwest of Tunisia are more similar to those elaborated by Piqué [2] for the Catalonia region. This is also in accordance with the results obtained by Madrigal et al. [29], who developed *Pinus pinea* site index curves for natural and artificial stands in central Spain and they obtained, in the case of artificial stands, a higher growth rate at younger ages (before 30 years old). However it is important to point out that in our case the inflexion point of the curves it is not inferior to zero as Madrigal et al. [29] found in the case of artificial stands.

At the older ages when the two Spanish models are quite similar, the Tunisian model presents on the contrary higher asymptotes. Even though we do not have any information about size of old *Pinus pinea *trees in Tunisia, the asymptote given by M6 (around 37 m) is a realistic value from the biological point of view and in accordance with previous studies, where it is mentioned that old *Pinus pinea* trees can reach heights of about 35 m [47]. This fact can be attributed, on the one hand, to the high quality of the soils and the good precipitation in northwest of Tunisia and, on the other hand, to the methodologies used for the models construction (GADA for NW Tunisia and ADA for Spain).

#### 5. Conclusion

A site index model that describes the dominant height growth of *Pinus pinea* plantations in northwest of Tunisia has been developed, which is a substantial advance in modelling of the growth of this species since no previous site index model existed. The model selection procedure indicates that the generalized algebraic differences approach (GADA) used with the Lundqvist-Korf base equation fitted with the dummy variable technique resulted in the best compromise between biological and statistical aspects, producing the most adequate site index curves. It has the property of time invariant predictions ensuring that projections over the same period of time are equivalent, regardless of the length or number of projection intervals. This dynamic equation can provide predictions directly from any age-dominant height pair without compromising consistency of the predictions. The predictions are unaffected by changes in the base-age. The site equation is mathematically sound and always produces consistent predictions. The model has the property of being continuous and reciprocal. The range of application is between 15 and 60 years, since the data available to elaborate the model are distributed approximately between those limits.

The elaborated curves define 4 site quality classes (Figure 9), showing that at 30 years of age the corresponding mean dominant heights reach 15, 12, 9, and 6 m with asymptotic values of about 38, 31, 25, and 18 m, respectively. These asymptotic values seem to exceed slightly the heights that *Pinus pinea* can reach in Tunisian natural conditions. This fact is not very important, because it is probable that the natural death of trees takes place long before. According to sampled plots, 6% of the studied plantations have a dominant height which does not exceed 7.5 m at 30 years of age, 10% present at the same age a dominant height which exceeds 13.5 m, 44% have a dominant height located between 10.5 m and 13.5 m, and finally 40% have a dominant height located between 7.5 m and 10.5 m.

#### Acknowledgments

This study (Project no. A/4996/06) was carried out within the framework of the bilateral cooperation between Tunisia (INRGREF) and Spain (University of Lleida) and was financed by the Ministry for Scientific Research and Technology of Tunisia (MSRT) and the Spanish Agency of the International Cooperation and Development (AECID). The authors acknowledge the support received from EFIMED. In addition, the framework provided by AGORA project has facilitated the transfer of scientific methods from Portugal and Spain to Tunisia that has made it possible to conduct this study.