Abstract

Subsetting of samples is a promising avenue of research for the continued improvement of prediction models for soil properties with diffuse reflectance spectroscopy. This study examined the effects of subsetting by soil total carbon ( ) content, soil order, and spectral classification with k-means cluster analysis on visible/near-infrared and mid-infrared partial least squares models for prediction. Our sample set was composed of various Hawaiian soils from primarily agricultural lands with contents from <1% to 56%. Slight improvements in the coefficient of determination ( ) and other standard model quality parameters were observed in the models for the subset of the high activity clay soil orders compared to the models of the full sample set. The other subset models explored did not exhibit improvement across all parameters. Models created from subsets consisting of only low samples (e.g., < 10%) showed improvement in the root mean squared error (RMSE) and percent error of prediction for low soil samples. These results provide a basis for future study of practical subsetting strategies for soil prediction.

1. Introduction

Diffuse reflectance spectroscopy (DRS) and chemometric analysis have become popular subjects of research for their potential to predict soil carbon and other soil properties. This methodology could be beneficial for monitoring soil quality and temporal variation, as well as helping to facilitate digital soil mapping efforts. Both visible/near-infrared (VNIR) and mid-infrared (MIR) spectra show promise for the prediction of soil total carbon ( ) and organic carbon, as well as organic matter, total N, total P, sand, silt, and clay fractions, cation exchange capacity, and pH (e.g., [18]). Particular attention has been given to soil carbon, which is an important indicator of soil fertility and biological activity and is crucial to carbon sequestration endeavors [912].

Partial least squares regression (PLSR) appears to be the most widely used chemometric method for developing prediction models from soil diffuse reflectance spectra. A sample set is commonly divided into two groups with the larger used for calibration and the smaller for validation to approximate true independent model validation, but no clear or consistent guidelines have been adopted for this process. Model results are known to vary with different groupings of samples for the calibration and validation sets. To address this issue, some studies have created multiple models, each with different random divisions of the sample set into calibration and validation sets, to reflect the range of possible results [13, 14].

Highly accurate prediction models are required for DRS to be an effective method for soil carbon determination in practical applications. Many statistically robust models have been developed (e.g., [58, 15]), but a single procedure is not necessarily the best for producing high quality models from different soils in different locations. Even models that have excellent correlation between soil spectra and properties could be improved. For instance, the robust PLSR models of McDowell et al. [8] have relatively large errors in prediction at very low values, which decreases the utility of the models in situations where low soils or small changes in are examined. Additional methods are being explored to produce the most robust and accurate DRS prediction models possible for different local and global soil datasets. One promising idea is to split the sample set into groups based on similar characteristics and to develop individual prediction models for each of these subsets. In studies of soils from Poland, Brazil, and Florida (USA), previous researchers have investigated subsetting by characteristics such as carbon content, soil order, soil texture, and spectral similarity with varied success for their particular sample sets [1618].

The current work aimed to improve the prediction of with VNIR and MIR DRS by creating attribute-specific chemometric models. Specifically, we investigated if predictions from a chemometric model built only from a subset of samples that are similar with respect to a particular characteristic (i.e., ) will provide better predictions than a comprehensive model built from a set of all possible samples. The study investigated the following three subsetting strategies: (1) soil value; (2) soil order; (3) spectral classification with -means cluster analysis. Each of the various subset models was compared against the original full sample set model to assess the magnitude of changes in the predictions. This study was built upon the research reported in McDowell et al. [8]. In that work the authors demonstrated the ability of DRS to predict Ct in Hawaiian soils. The success of different wavelength ranges (i.e., VNIR versus MIR) and chemometric methods was investigated, as well. Because these ideas have been previously explored in McDowell et al. [8], they will not be discussed further here.

2. Materials and Methods

2.1. Sample Collection and Preparation

The sample set for this study is composed of 307 soil samples collected across the five main Hawaiian Islands of Kauai, Oahu, Molokai, Maui, and Hawaii, illustrated in Figure 1. Two hundred and sixteen of these samples were collected from 1981 to 2007 and stored in the archive at the Natural Resources Conservation Service (NRCS) National Soil Survey Center in Lincoln, Nebraska, and the remaining 91 samples were newly collected in 2010. Within this full set of samples, 10 soil orders and more than 100 soil series are represented. Samples were predominantly from a variety of agricultural soils, hosting over 25 different crop types. The majority of samples are of surface soils (~77%), and the remainder are of corresponding subsurface soil horizons from 17 of the collection sites. The soil samples were dried and sieved to retain the less than 2 mm fraction for VNIR DRS analysis. A portion of each sample was also ball-milled to less than 250 μm for MIR DRS analysis.

2.2. Traditional Total Carbon Analysis

Dry combustion was used to measure the of ball-milled soil samples. Several of the samples obtained from the NRCS archive were previously measured for by dry combustion before storage. All remaining samples were analyzed at the Agricultural Diagnostic Services Center (ADSC) at the University of Hawaii Mānoa with an LECO CN2000 combustion gas analyzer [19]. A small portion of the previously measured NRCS archive samples were reanalyzed at ADSC to provide a cross-check of the values obtained from different laboratories. The values of the full sample set range from <1% to 56% with a distribution weighted toward the lower end.

2.3. Visible/Near-Infrared Diffuse Reflectance Spectroscopy

Visible/near-infrared diffuse reflectance spectra were collected from the 2 mm sieved soil samples with an Agrispec spectrometer and muglight light source (Analytical Spectral Devices, Inc., Boulder, CO, USA). The Agrispec has three detectors with a combined spectral range of 350 to 2500 nm, sampling interval of 1 nm, and spectral resolution from 3 nm (at 700 nm) to 10 nm (at 1400 nm). Each soil sample was measured three times, with the sample cup rotated 20° between each measurement. The three spectra were averaged to produce the final spectrum for each sample. A Spectralon (Labsphere, North Sutton, NH, USA) white reference was measured as a reference spectrum to begin each session and again every 30 minutes or less thereafter. A slight offset in reflectance between the range covered by the first and second detectors was observed in many spectra, and, therefore, we removed the narrow region of 990–1010 nm from the final spectra for analysis. The VNIR spectra of these soils commonly exhibit features associated with OH, H2O, iron oxides, phyllosilicates, and organic molecules. For regression analysis the spectra were transformed using the pretreatment identified as most effective for this data set in McDowell et al. [8]. For the VNIR spectra, this optimal preprocessing transformation was mean normalization.

2.4. Mid-Infrared Diffuse Reflectance Spectroscopy

Mid-infrared diffuse reflectance spectra were collected from the ball-milled samples in neat form with a Scimitar 2000 FTIR spectrometer (Varian, Inc., now Agilent Technologies, Santa Clara, CA, USA) and diffuse reflectance infrared Fourier transform (DRIFT) accessory. The spectral range is 400 to 6000 cm−1, with a sampling interval of 2 cm−1 and spectral resolution of 4 cm−1 (note: the range of our MIR spectra overlaps slightly with the range of our VNIR spectra.) Spectra were corrected for background atmospheric and instrument effects by the subtraction of the spectrum of KBr powder measured between every seven samples, but features in two narrow regions persisted. Therefore, we excluded the regions of 1350–1419 cm−1 and 2281–2449 cm−1 from the analysis. Features in the MIR spectra of these soils are attributable to OH, organic molecules, and a variety of silicate minerals. Based on the findings of McDowell et al. [8], before regression analysis the Savitzky-Golay 1st derivative transformation was applied to the MIR spectra as this was determined to be the most effective pretreatment for this data set.

2.5. Regression Analysis

Partial least squares regression (PLSR) was employed to develop the chemometric models for Ct prediction. Models were generated using the Unscrambler X Software package (CAMO Software Inc., Woodbridge, NJ, USA). The spectral range included in the analysis was decreased slightly by removing any high noise portions at the limit of the range; therefore, the VNIR spectra were restricted to the range of 425–2450 nm, and the MIR spectra were restricted to 489–5300 cm−1. All spectra were mean centered for PLSR analysis. The optimal number of factors for regression was chosen individually for each model based on maximizing the explained variance but minimizing the possibility of over fitting. We considered several parameters when assessing the quality of models, including the coefficient of determination ( ), root mean squared error (RMSE), residual prediction deviation (RPD) [20], and the ratio of performance to interquartile distance (RPIQ) [21]. We defined the RPD as the ratio of the standard deviation of the validation set to the standard error of prediction (RPD = SD/SEP) and the RPIQ as the ratio of the interquartile distance of the validation set to the standard error of prediction (RPIQ = IQ/SEP), where the interquartile distance is the difference between the third and first quartiles (IQ = Q3 − Q1). With respect to these general model quality parameters, the best model would have the highest , RPD, and RPIQ, and the lowest RMSE. We also examined the success of the predictions for individual samples using the percent error, calculated as the absolute difference between the measured (i.e., by combustion) and predicted (i.e., by DRS) values, divided by the measured value, and multiplied by 100.

2.6. Sample Subsetting

The motivation behind our selected subsetting strategies was to improve prediction while still retaining the simplicity that makes DRS attractive. We focused on subsetting criteria that did not require additional highly detailed soil characterization, instead relying on general soil data and information within Soil Taxonomy.

2.6.1. Content Subsets

A simple grouping of soils into low and high was used for subsetting by value. Preliminary work tested a variety of low /high divisions (e.g., 2, 4, 6, 8, and 10%   ) iteratively. The initial results showed that a cutoff of 10%   was most promising and therefore was used for the final analysis. Additionally, a division at 10% allows for fairly easy assignment of unknown soils into low or high groupings from estimates based on general or readily available soil information.

To approximate independent validation, samples were randomly split into a group of 70% for model calibration and 30% for model validation. This random selection was repeated to produce 10 iterations of calibration/validation pairs from the full sample set. After this split, the samples from each iteration were divided into low (<10%) and high (>10%) subsets. Separate VNIR and MIR regression models were then developed from the low and high portions of each of the 10 iterations. For comparison, VNIR and MIR regression models from the full sample set using these same 10 calibration and validation divisions, but no separation by value, also were created.

2.6.2. Soil Order Subsets

Four broad soil groups were created based on general similarity of soil order and number of samples available of that type. The allophane-dominated volcanic Andisol soils comprised one group ( ), the Aridisol, Entisol, Inceptisol, Mollisol, and Vertisol soils were combined to make a second group (high activity clay soils; ), Oxisol and Ultisol soils made a third group (low activity clay soils; ), and Histosol and Spodosol soils comprised the fourth group (organic-dominated soils; ). These soil groupings are based upon information contained in Soil Taxonomy allowing for the development of soil groups according to clay mineralogy and soil organic matter. Table 1 provides information on additional soil properties for each soil subset where available. The average spectra for each of these soil groups are shown in Figure 2. Nine soil samples from the NRCS archive had no recorded taxonomic classification and therefore were not included in these subsets.

The full sample set was randomly divided 10 times into a group of 70% of samples to be used for the calibration of the regression models and 30% of the samples to be used for validation. After this division, the samples of each of the ten iterations were grouped according to soil order as described above. Separate VNIR and MIR regression models were then developed for each soil group subset within each of the ten calibration/validation iterations. Because the number of low activity clay and organic-dominated soil samples was small (e.g., ≤80), full cross validation (i.e., leave-one-out cross validation) was used with the regression models for these two groups rather than committing 30% of those samples to validation as with the other subsets. Additional models were created from the 10 calibration/validation divisions of the full sample set with no separation of soil order for the comparison of results without subsetting. A full cross validation model of the full sample set was developed to be compared with the low activity clay and organic-dominated soil subsets’ full cross validation models.

2.6.3. Spectral Classification Subsets

Our rationale behind grouping soil samples by spectral character is based on the assumption that this approach removes major spectral variation from consideration so that small-scale variation is used to produce a more refined prediction model. Also, the division of soil samples into subsets created solely from spectral classification has the advantage of requiring no additional information about the soil.

The spectral classification subsets were created by -means cluster analysis with Unscrambler . Spectra were assigned to three cluster subsets based on the minimum Euclidean distance to cluster centers. Separate analyses were conducted for the VNIR and MIR spectra, resulting in different combinations of samples in their cluster subsets. The spectral range used for these cluster analyses was limited to the regions most relevant to carbon prediction as previously determined by the PLSR variable significance analysis by McDowell et al. [8]. Specifically, the ranges used were 600–750, 898–990, 1910–1938, 2070–2150, and 2288–2316 nm for the VNIR spectra and 1500–1870, 3650–3690, 4235–4260, 4305–4330, 4410–4455, and 5280–5245 cm−1 for the MIR spectra. Each cluster subset was randomly divided into a group of 70% for model calibration and a group of the remaining 30% for model validation, unless the number of samples in the cluster was small (e.g., ≤80), in which case samples were not divided and full cross validation was performed. The random division into calibration and validation groups was repeated nine more times to give 10 calibration/validation pairs for each of the VNIR and MIR cluster subsets. Separate prediction models were created for each of the different cluster subsets. For comparison, we also developed 10 VNIR and 10 MIR models from the full sample set. The calibration and validation groups for these models were created by combining the respective calibration or validation groups from the three different cluster subset models. VNIR and MIR full cross validation models using the full sample set were also produced to compare with full cross validation models from small cluster subsets.

3. Results and Discussion

3.1. Modeling of Content Subsets

The VNIR models subset by content produced the results summarized in Table 2 and plotted in Figure 3(a). The range of results from the 10 random divisions of the samples into 70% calibration and 30% validation groups is given along with their mean value. The , RPD, and RPIQ values for the low subset were not as good as those produced using the full sample set, though the RMSE values were lower for the low subset. The results for the high models approached, but were not quite as good, as the results from the full sample set.

Results from the MIR subset models are shown in Figure 3(b) and Table 3. The models produced by the low subset were generally of lesser quality than those of the full sample set, with the exception of better RMSE values, a trend similar to the VNIR models. The high models were comparable overall to the high quality models produced by using the full sample set.

From these results, it appears that a separate high prediction model is not an improvement over a model utilizing the full range of available samples for either the VNIR or MIR spectra from this data set. This statement may be true for a separate low prediction model as well, but the benefit of a lower RMSE should also be considered.

Results varied for previous studies examining the behavior of separate models based on carbon content. Madari et al. [16] found that limiting the in their NIR and MIR calibration models to 0.4–99.10 g kg−1 and 0.4–39.90 g kg−1 decreased the not only , but also the root mean squared deviation (RMSD) compared to the original NIR and MIR models (0.4–555 g kg−1   ); this behavior is similar to that observed in the low models presented here. The study by Vasques et al. [18] developed separate VNIR organic carbon prediction models for their mineral and organic soil samples, which roughly correspond to division by carbon content in this case (mineral soils, 0.01–14.70% carbon; organic soils, 13.52–57.54% carbon). Compared to the original combined model, the improved for both of the subset models, but the RMSE decreased for the lower carbon mineral group and increased for the higher carbon organic group. The increase in values for the subset models differs from what is seen in our work and that of Madari et al. [16] and is an example of soils with different characteristics responding differently to the same treatment.

3.2. Modeling of Soil Order Subsets

The results of the VNIR models from the soil order subsets are given in Table 4 and Figure 4(a). The models from the Andisol subset did not perform as well as the models using the full sample set. The , RMSE, and RPD values for the high activity clay subset were similar to those of the full sample set models, but the RPIQ values were generally slightly lower. The low activity clay and organic-dominated subsets were not validated with an independent validation set due to small sample numbers, and therefore their results may be overly optimistic. Compared to a full cross validation of a model created from the full sample set, the low activity clay subset model did not perform as well, except when considering the RMSE parameter, whereas the organic-dominated subset model is broadly similar.

Table 5 and Figure 4(b) show the results of the MIR soil order subset models. The models produced by the Andisol subset had no improvement on the models produced by the full sample set. Results for the high activity clay subset models were as good as or better than the full sample set model results, with the exception of lower RPIQ values. The overall performance of the low activity clay and organic-dominated subset models using full cross validation was not quite as good as the full cross validation model from the full sample set.

These results suggest that a separate prediction model for the high activity clay soil orders may have a slight advantage compared to a model with all available soil orders for both the VNIR and MIR spectra of this data set. Separate prediction models for the other soil order subsets do not appear to be as promising.

A study by Madari et al. [16] also investigated the benefits of subsetting their samples according to soil order. The authors produced separate models for the Histosols and Spodosols, the Ferralsols (classification according to the World Reference Base [22], approximately equivalent to most of the Oxisol soil order), and the Acrisols (classification according to the World Reference Base [22], consisting of many Ultisol suborders and some Oxisols). The results of these models varied. The Ferralsol and the Acrisol NIR and MIR models had lower than the original model and also lower RMSD; these two subsets included relatively low (2–85.10 g kg−1 and 1.70–91.60 g kg−1, resp.) compared to the full sample set (0.40–555 g kg−1), so this lower and lower RMSD are a similar behavior to the low subset models in the current study. The Histosol and Spodosol subset NIR and MIR models in Madari et al. [16] resulted in slightly higher values and much higher RMSD values. Our Histosol and Spodosol (i.e., organic-dominated soils) subset models did not have significantly increased values, but the validation RMSE values were greater than the full sample set models’ values.

Vasques et al. [18] developed separate organic carbon prediction VNIR models for each of the seven soil orders in their sample set consisting of soils from Florida, southeastern USA Compared to the original model containing all of these mineral soil samples, six of the seven soil order subset models resulted in improved values (Alfisols, Entisols, Inceptisols, Mollisols, Spodosols, and Ultisols). The RMSE values were also similar or better for these subsets. The Histosol subset model was the only one that did not improve in or RMSE. These results are somewhat different from those in this study, where only the high activity clay soils (i.e., Aridisols, Entisols, Inceptisols, Mollisols, and Vertisols) are suggested to provide an overall improvement on models including all available samples.

3.3. Modeling of Spectral Classification Subsets

The -means cluster analysis of the VNIR spectra resulted in an unequal distribution of samples between the three clusters. The Cluster 0 subset consisted of only 78 samples (~3–56%   ) and therefore all 78 samples were used in its model calibration and full cross validation. The Cluster 1 and Cluster 2 subsets contained 124 samples (~0–23%   ) and 105 samples (~0–14%   ), respectively, allowing for the independent validation of the models as initially planned. The results of the 10 VNIR prediction models from each of the clusters are given in Table 6 and Figure 5(a). A comparison of the Cluster 0 subset model with a full cross validation model of the full sample set showed that the subset model was not quite as robust, though it did produce a higher RPIQ value. The Cluster 1 and Cluster 2 subset models’ results generally had lower (i.e., better) RMSE values, but were otherwise not quite as robust as the full sample set models’ results.

In the cluster analysis of the MIR spectra, the distribution of samples was heavily weighted toward the Cluster 0 (137 samples, ~0–52%   ) and Cluster 2 (132 samples, ~0–11%   ) subsets. The Cluster 1 subset contained only 38 samples (~15–56%   ) and was validated with full cross validation instead of independent validation. Table 7 and Figure 5(b) present the results of the prediction models from the cluster subsets, as well as those from the full sample set models for comparison. The results for the Cluster 0 subset models are broadly similar to those of the full sample set models but overall they are not an improvement. Results from the full cross validation of Cluster 1 subset were slightly higher for calibration but much lower for validation than the full cross validation of the full sample set. In general, the Cluster 1 model is not as robust as the full sample set model. The overall performance of Cluster 2 subset models is not quite as good as the full sample set models, but the limited range of Cluster 2 subset is apparent from its much lower range of RMSE values.

For this sample set, the spectral classification by -means clustering and separate prediction model for each cluster was not an obvious improvement over the original full VNIR or MIR models. The most noticeable difference is the lower RMSE for the subset models from clusters limited to low values.

We have found one other study that investigated the effect of subsetting a sample set by spectral classification for the prediction of soil carbon. Cierniewski et al. [17] tested the effect of four different unsupervised classification algorithms ( -means, expectation-maximization, Ward’s Euclidean distance, and Lance and Williams’ Euclidean distance) on simple linear regression results from VNIR data. These clustering algorithms produced five or six clusters, and the number of samples per cluster ranged from four to 56. This is in contrast to the method of -means cluster analysis used in our study, where we specified that three clusters be produced to decrease the probability of a very low number of samples in a cluster that would not be adequate for robust modeling. Cierniewski et al. [17] found that the majority of their cluster subsets had improved values compared to the original full sample set. An increase in was not observed for the spectral classification subsets in the current work. Instead, the most significant improvement was a lower RMSE for many of the cluster subset models. Because other parameters such as RMSE were not provided in Cierniewski et al. [17], it is difficult to determine if this behavior is an effect of their subsetting study.

3.4. Percent Error of Prediction

The subset models with improved RMSE values but an otherwise less-robust performance may still hold an advantage over the original full sample set model. If a more accurate prediction of the low samples makes a significant contribution to the lowered RMSE, the model could be very helpful in addressing the issue of large errors at low values. To evaluate the error at these low values, the percent error of prediction was calculated for the samples with values less than 10% and the average value was reported for each model (Figure 6). We use percent error rather than RMSE for comparing the subset models with the full sample set model to normalize the error of the predicted value with respect to its measured value.

The mean value of the average percent error for each of the ten iterations of the full sample set model is ~160–200%, but the average percent error for a single model could be up to almost 400% (Figure 6). For example, with a measured value of 1%   , an error of 400% would be translated to a predicted value of 5%   . The MIR full sample set models have lower average percent error, with a mean average percent error of ~135–150% and a maximum average percent error of ~200%. Many of the low RMSE subset models have noticeably lower average percent errors. The low VNIR and MIR models and the Cluster 2 MIR models appear to have the most significant improvement, with average percent errors of ~80% or less. For a measured value of 1%   , a percent error of 80% would reduce the predicted value to 1.8%   . Clusters 1 and 2 VNIR models also show moderate improvement, with all average percent error results below ~175%. The average percent error of the low activity clay soils full cross validation model is slightly lower than the full sample set model for both the VNIR and MIR data. The organic-dominated soils subset includes only two samples with <10%, so a comparison of average percent error is not as reliable in this case.

The subsets with the largest decreases in average percent error of prediction at low content (i.e., ) are the ones that included only low samples in their models. The low VNIR and MIR models contained samples with values between ~0 and 9.9%   , and the Cluster 2 MIR models had samples with values between ~0 and 11%   . These results suggest that a separate model for low samples is beneficial for the accuracy of prediction for the samples in this range. This advantage is indicated by the RMSE of low models, but may not be obvious from the parameter. The issue of relatively large errors of prediction for samples with very low content has been understudied. To our knowledge there are no studies that have provided quantitative information addressing the degree of scatter observed for low soils on most predicted versus measured plots.

3.5. Variation in Model Parameters

The ranges of PLSR model parameters produced by the 10 iterations of random calibration/validation set divisions in this study appear to be larger than the ranges of values encountered in previous studies where multiple PLSR model iterations were used. Brown et al. [13] reported results for five models produced from different random divisions of the sample set into 70% calibration and 30% validation groups. Values for organic carbon prediction from VNIR data ranged from 0.75 to 0.86 for , 1.08 to 1.26 for RMSD, and 1.95 to 2.62 for RPD. Mouazen et al. [14] included three model iterations with random divisions into 90% calibration and 10% validation groups in their study. The exhaustive results are not reported, but visual estimation from plots of the mean and standard deviation for the and RMSE from the organic carbon prediction models suggests that the variation is similar to that in Brown et al. [13] or less. The greater range in model parameters observed in our study may be related to the testing of a greater number of iterations (i.e., 10 rather than five or three), or it could be related to a less obvious attribute, such as a greater variation in a spectral character within the sample set.

4. Summary and Conclusions

Our research has provided an introduction to the under-studied idea of sample subsetting based on criteria that are simple and easily applied. This particular investigation of subsetting for prediction had varied results with our Hawaiian soils sample set. Of all the different subset models created based on content, soil order, and spectral classification, the subset of high activity clay soil orders was the only one to show improvement across all parameters (i.e., , RMSE, RPD, and RPIQ) compared to the full sample set. Notably, one significant advantage was discovered; the subsets including only low samples (e.g., < 10% subset, MIR Cluster 2 subset) produced models with much lower RMSE values compared to the full sample set models, even though the other model parameters were not as robust. The lower RMSE for these models corresponds to a significant decrease in the percent error of predictions for low samples, which could be very helpful for the analysis of soils with low content or monitoring of small changes in . Incorporation of a low subset model in the future prediction of unknown soils values could be done by first employing a model created with the full possible range of values and then utilizing the separate low subset model if the soil is predicted to have low .

As seen from this study and previous studies, the effect of subsetting can have different results depending on the character of the sample set and the number of samples it includes. A small sample size may have limited the improvement possible by subsetting in the current work. In an effort to keep the size of subsets large enough for regression analysis, the subsetting may have been too coarse (e.g., too few subsets for prediction by soil order and spectral classification). The types of subsetting strategies explored here may be most helpful for large datasets and should be tested with further research. Regardless of the strategy used to develop a model, our results suggest that multiple iterations of models with different calibration/validation groupings may help to produce a more complete picture of the overall model quality.

Acknowledgments

This research was supported by USDA CSREES TSTAR Project 2009-34135-20183 and UHM College of Tropical Agriculture and Human Resources (CTAHRs) Hatch Project HA-154. The authors thank J. Hempel, L. West, T. Reinsch, L. Arnold, and R. Nesser of the NRCS National Soil Survey Center in Lincoln, NE, USA, for help with access, sampling, and scanning of the archived samples; L. Muller and A. Quidez for help with scanning of samples at UHM; Drs. G. Uehara, R. Yost, and D. Beilman of UHM for the support of this project. They also appreciate the Hawaii landowners, managers, and extension agents that gave them access to their fields for collecting soil samples. These include from Kauai: R. Yamakawa and J. Gordines (CTAHR), S. Lupkes (BASF), and Grove Farms; from Oahu: R. Corrales, A. Umaki, and J. Grzebik (CTAHR), Hoa Aina, MAO Organic Farm, Nii Nursery, J. Antonio and M. Conway (Dole), C. and P. Reppun, L. Santo, T. Jones, and N. Dudley (HARC), and A. Sou (Aloun Farms); from Molokai: A. Arakaki (CTAHR), K. Duvchelle (NRCS), and R. Foster (Monsanto); from Maui: J. Powley and D. Oka (CTAHR), M. Nakahata and M. Ross (HC&S), T. Callender (Ulupono), and B. Abru.