Abstract

Discriminant analysis classification (DAC) and decision tree classifiers (DTC) were used for digital mapping of soil drainage in the Bras-d’Henri watershed (QC, Canada) using earth observation data (RADARSAT-1 and ASTER) and soil survey dataset. Firstly, a forward stepwise selection was applied to each land use type identified by ASTER image in order to derive an optimal subset of soil drainage class predictors. The classification models were then applied to these subsets for each land use and merged to obtain a digital soil drainage map for the whole watershed. The DTC method provided better classification accuracies (29 to 92%) than the DAC method (33 to 79%) according to the land use type. A similarity measure () was used to compare the best digital soil drainage map (DTC) to the conventional soil drainage map. Medium to high similarities () were observed for 83% (187 km2) of the study area while 3% of the study area showed very good agreement (). Few soil polygons showed very weak similarities (). This study demonstrates the efficiency of combining radar and optical remote sensing data with a representative soil dataset for producing digital maps of soil drainage.

1. Introduction

Much of Canada’s agricultural land has been mapped at reconnaissance (scale 1 : 125 000) or semidetailed scales (scale 1 : 50 000 or 1 : 63 000). Although these maps are useful for watershed management purposes, the soil information contained within them is often obsolete is typically of poor uniformity, and, nearly, always lacks the precision required to support land use decision-making or modelling efforts. The production of maps at more detailed scales, as well as the inclusion of additional soil drainage information, is thus required for map users to effectively inform land use and management decisions [1]. Mapping soil drainage classes with accurate and objective information is important for both crop productivity and hydrological modelling. Conventional soil mapping methods are laborious and expensive, requiring intensive soil sampling over large areas. Soil surveyors usually identify typological landscape units by their characteristics (i.e., form, geomorphology, vegetation, geology, etc.) and then select representative locations for describing and sampling soil profiles. The number of profiles needed to represent a landscape unit depends on the prospective scale and soil variability. In addition, conventional soil maps are generally created using a polygon-based approach, where different soils on the landscape are represented as polygons with discrete borders. Using this approach, representative soil attributes for a soil are usually assigned to a whole polygon and result in maps that contain abrupt transitions between neighbouring polygons [2]. Although this approach often sufficiently serves the needs for representing the spatial distribution of soils over a landscape, it is not well suited for quantitative environmental modeling. This is because the spatial and thematic content of such maps are limited [2]. New advances in digital soil mapping provide promising new approaches that directly address the above issues by using spatial prediction models to derive soil property information [3].

By using ancillary and drainage-related variables measured on a continuous basis, raster-based models can better represent the continuous nature of soils. During the last decade, remote sensing data has become the first choice for updating and upgrading soil survey information. The increasing availability of satellite images and the launch of new sensors with spatially and spectrally finer resolutions have encouraged the development of this approach [4]. Soil drainage classes are determined on the basis of the depth and the duration of local water table regimes, which are assessed by examining soil texture and the type, degree, and depth of soil mottling and gleying in soil samples [5].

Both optical and radar remote sensing have the potential to map soil information, such as soil drainage. In optical remote sensing, the characteristics of the reflectance spectrum from a soil surface are related to the biophysical and chemical properties of the soil profile. Several spectral indices (brightness, redness, and coloration index) extracted from visible and infrared satellite data presented close relationships with various soil color components and soil properties of the surface layer: texture, organic matter, soil moisture, and so forth [6]. Mattikalli [7] pointed out that these characteristics were directly related with soil surface colour and brightness as well as other soil properties, such as texture composition, moisture, and organic matter content. Soil reflectance was found to change exponentially as a function of volumetric soil moisture [8]. Soil biophysical properties can be extracted from fine resolution measurements of soil reflectance spectrum in the 400–900 nm range [9]. Liu et al. [10] showed the usefulness of Compact Airborne Spectrographic Imager (CASI) hyperspectral data in mapping soil properties and delineating within-field soil management units.

For radar remote sensing, the magnitude of the backscattering coefficient of a soil surface is governed by the dielectric constant and soil surface roughness. The dielectric constant, in turn, is strongly dependent on the soil moisture content and, to a lesser extent, on soil texture composition (percent of sand and clay) [11]. In agricultural applications, monopolarized synthetic aperture radar (SAR) data are mostly limited due to the low separability of the different scattering processes caused by soil and crop properties. It is more useful when monopolarized data are combined within a multitemporal series analysis [12, 13]. Multipolarized SAR data have also been used successfully for stratifying a field into soil management units, particularly under low vegetation cover conditions [14]. Polarimetric SAR data analysis is mostly recent tool and it has been found promising in some agricultural applications [15]. Polarimetric decompositions suggested by Cloude and pottier and by freeman [15] could bring a potential solution to improve soil drainage classification models [16].

Remote sensing data have also been combined with terrain attributes for classifying soil drainage in many studies [5, 1720]. They integrated the digital elevation model (DEM) with spectral indices. The spectral indices can be used as indicators of prevailing ephemeral factors such as the amount of vegetation [21]. In areas under natural vegetation, the association of vegetation type with soil or the impact of a soil property on biomass accumulation can be exploited for mapping soils using remote sensing data [21, 22].

Conversely few studies combine optical and radar remote sensing data for mapping soil properties. We investigated this relatively novel approach which takes the advantage of exploiting soil dielectric and spectral properties for mapping soil drainage variability. Due to livestock farming, diversified agricultural practices were evident in our study area, from spring bare soil conditions to vegetation consisting in cereals, forage and pasture, annual crops, forest and wetland. Thus, we analysed the land use type impacts on the classification accuracies of soil drainage classes. Firstly, in order to explicitly evaluate these impacts, a classification of the land use (maximum likelihood classification) and a selection of optimal subset of raster (forward stepwise procedure) were made in order to improve the classification accuracies. Two methods of digital soil mapping, that is, discriminant analysis and decision tree classifiers (DACs and DTCs), were applied on a training dataset and then validated with the remaining soil survey database. Due to the polygon-based approach of the conventional soil map [2], it was difficult to compare digital soil drainage maps derived from remote sensing with the conventional soil drainage map. We used a similarity measure based on the probability (proportion) of presence of each drainage class within all individual polygons to evaluate the agreement between these maps.

2. Materials and Methods

2.1. Study Site

The Bras d’Henri watershed is located about 30 km in the south of Quebec City (QC, Canada). The geographic boundary of the study area encompasses a watershed of approximately 167 km2 of which about 66% is now devoted to agriculture (field crops, pastures, etc.). The remaining territory consists of forest, wetlands, or urban areas. This watershed is influenced by a humid continental climate that exhibits large seasonal temperature contrasts with hot summers (and often humid) and cold (sometimes severely cold) winters, with large amount of precipitation (1150 mm) mainly as rainfall (Environment Canada http://www.ec.gc.ca/).

Soils of the study area have been previously mapped and described [23]. Two main physiographic areas are present in this watershed (Figure 1) from upstream to downstream (south to north), these are (a) the Appalachian plateau and highlands (189–380 m relief) and (b) the Saint Lawrence lowlands (115–202 m relief). The bedrock dates from the Cambrian to the Ordovician periods and consists of shales, sandstones, slates, and limestones. Parent materials mainly encountered in the watershed are till, fluvial, fluvio-lacustrine, and alluvial deposits. Generally, the soils are deep, well to very poorly drained, and located on flat, undulating, hilly (slopes 1 to 15%) or inclined (15 to 30%) landforms. They are developed over a loamy (56%) or sandy (27%), acidic or neutral parent material. They mostly belong to the Gleyed or Orthic Humo-Ferric Podzols (Aquic to Typic Haplorthods), Orthic Gleysols (Aquepts), or Orthic Humic Gleysols (Humaquepts). Acidic organic soils (Histosols) have accumulated (11%) in depressional (slopes <2%) areas as swamps covered by a veneer of humic material over mineral soils (Terric Humisols) or as deep bogs (Typic Humisols).

2.2. Soil Data Collection
2.2.1. Soil Survey Data

A conventional soil survey was conducted during summer 2004 and 2005 across the watershed to update and upgrade available soil information [23]. Soil description and sampling were based on a stratified random transect method [24]. This method can be summarized as the collection of available geological, parent material deposit, vegetation, soil, and climatic information. Based on this information, homogenous soil polygons were delineated using aerial photography. These polygons can be considered as strata. Transects along to major parallel environmental gradients (200 m and 150 m apart) were positioned within each polygon and soil profile data were collected along these transects. Additional transects were distributed throughout the study area to characterize all segments of the landscape within the watershed. This sampling approach has been found a useful and flexible design for mapping soil properties [24].

Semidetailed soil observations were realised at the survey intensity level 3 (SIL3) which is commonly used for watershed study (1: 40 000 scale, sampling density: 1 soil profile per 16 ha). Detailed soil sampling (SIL2, 1 : 20 000 scale, sampling density: 1 soil profile per 4 ha) was performed in two representative microwatersheds (3 km2 each) of the Bras d’Henri watershed (Figure 1).

The soil properties described for each soil sample include 14 morphological features regularly used for soil drainage classification: horizon type, mottling description, depth to gley, color (hue, value, and chroma), structure description, and texture. Five out of seven soil drainage classes were identified in the study area (Figure 2). These classes (D3–D7) were characterized by soil scientists of the Pedology and Precision Agriculture Laboratories of Agriculture and Agri-Food Canada according to the Canadian System of Soil Classification [25]. The soil survey map and its legend [23] were used to derive the conventional soil drainage map (association of soil series) at 1: 40 000 scale. Four map unit composition models are commonly used in Quebec soil survey reports [23] according to the number of soil components: one soil component (100%), two soil components (60% + 40%), three soil components (50% + 30% + 20%), and four soil components (40% + 30% + 20% + 10%). When a soil component covers two drainage classes, each class is considered to be equally represented in the soil polygon.

The collected morphological soil survey dataset consists of 1610 soil profiles. For training the supervised classification models (DAC and DTC), the published conventional soil drainage map and the soil survey dataset were used. After comparing soil profiles drainage classes (20 m buffer zone) to conventional soil drainage map, only twenty percent representative soil profiles (), showing a standard deviation equal to 0, were used as training dataset. This dataset belongs to the dominant soil drainage classes identified on the conventional soil map (soil inclusion not allowed). While many biological/ecological studies use 75% of the dataset for training, and 25% for validation [26], the inverse was used here. The expectations of this approach are to build a model that can produce reliable and more precise soil drainage map using a minimum and representative soil dataset as training in order to reduce the cost required for updating and upgrading conventional soil map (Figure 5).

The validation of the model (classification accuracy and Kappa coefficient) was made using all soil profiles included in the soil survey database. The distribution of the training soil profiles according to soil drainage classes is presented in Table 1.

Soil drainage classes observed in the Bras d’Henri watershed are mainly those characterizing an aquic soil moisture regime (D5, D6, and D7 classes). These three classes represent 87.8% of soil profiles included in the dataset, the D6 class having the largest percentage (51.7%). At the opposite, soil drainage classes D3 and D4 present the lowest percentage of occurrence (8.6% and 3.7%, resp.). As shown in Table 1, the selected training dataset are well representative of the distribution of drainage classes of the watershed.

2.2.2. Soil Moisture Monitoring

Soil moisture (surface and deeper horizons in the profile) and its variation are strongly related to soil drainage. RADARSAT-1 (SAR C-HH) signal is sensitive to soil moisture in top few centimetres (0 to 10 cm) in the soil. Penetration depth varies depending upon the soil dielectric, incidence angle and frequency [27]. Under bare soils conditions and moderately vegetation area, it can be expected that the temporal persistence of soil moisture patterns is reflected in the spatiotemporal behaviour of the radar backscattering coefficient [28]. The analysis of the potential capability of RADARSAT-1 multitemporal data for soil drainage estimation is then required. At each acquisition (Summer 2005, 5 dates), the gravimetric soil moisture content [29] was measured in six fields (1 profile per field × 5 depths per profile × 3 soil cores per depth) at the time of the satellite overpass. Volumetric soil moisture content was obtained by multiplying gravimetric soil moisture content by bulk density measured by the double-cylinder core method [30]. These six benchmark sites were selected to represent the diversity of soils (texture and drainage classes) identified in the Bras d’Henri watershed [23].

2.3. Remote Sensing Data Acquisition
2.3.1. RADARSAT-1 Images

Characteristics of the five RADARSAT-1 images (pixel size: 12.5 m) acquired in summer 2005 are shown in Table 2. All RADARSAT-1 images were orthorectified using a digital elevation model with a spatial resolution of 20 m and calibrated with PCI-Geomatica® (version 10.3, 2009-10-09). A Gamma filter with a 5 × 5 pixel window was applied to reduce the speckle noise. A 3 × 3 pixel window centred on each georeferenced soil profile was used to extract an accurate mean power values.

The pedoclimatic conditions (precipitation and soil moisture) during the acquisition periods are shown in Figure 3. This graphic shows that the mean volumetric soil moisture content, computed from benchmark sites (6), decreased regularly with the Julian day () at the beginning of the growing season when soil is drying, mainly due to natural and artificial soil drainage. These results indicate the potential of multitemporal RADARSAT-1 image information when their acquisition is done under spring climatic conditions for discriminating soil drainage classes. At the end of winter, soils are wet due to high precipitation and snow melt. On May 3rd, the image was acquired after a few rainy days, while on May 27th, the image was acquired after several days of low rainfall interspersed with many dry days (Figure 3) with normal seasonal air temperatures (about 15°C). Under these conditions, SAR data can provide some information about drainage state because the period of time between the two acquisitions corresponds to the 24-day repeat orbit for RADARSAT-1. The acquisition of RADARSAT-1 images separated by fewer days could be helpful also depending on the ability of water to infiltrate into the soil during a shorter period.

2.3.2. ASTER Image

Radar signal is both attenuated and scattered by vegetation; this has limited application of active radar methods to surfaces with little or no vegetation cover. In order to evaluate the contribution of optical remote sensing on the soil drainage classification, only one ASTER image acquired on June 3rd, 2005 free-cloud was available for this study. The spectral band characteristics of the image and their spatial resolutions are shown in Table 3. This acquisition date corresponds to the beginning of the growing season. With soil surface moisture, the vegetative growth and its development can be considered as soil drainage indicators. Campling et al. [5] combined terrain attributes with various spectral indices calculated from the Landsat Thematic Mapper (TM) images to predict soil drainage probability in a humid tropical area. They proposed to use three spectral indices [5]. The first one is the normalized difference vegetation index (NDVI) computed in the visible and the near-infrared (1). It is the most common index used in remote sensing. The NDVI is often correlated to LAI (leaf area index) [31] and differentiates between green vegetation and soil background. However, several studies shown that the NDVI is also affected by soil properties related to their colour and brightness [32], mainly in areas of sparse vegetation where the soil is the dominant spectral component. The two other indices used by Campling et al. [5] are the normalised difference infrared index (NDII) and the normalized difference water index (NDWI). The NDII (2) has been found correlated with plant moisture content [33]. The NDWI (3) combining of the near-infrared with the short-wave infrared removes variations induced by leaf internal structure and leaf dry matter content improves the accuracy in retrieving the vegetation water content [34]. All of these indices can help to predict soil drainage classes.

The equivalent spectral bands for ASTER data are in the visible red band (VNIR_Band2: 0.63-0.69 μm), in the near-infrared (VNIR_Band3N: 0.78-0.86 μm), and in the short-wave infrared channels (SWIR_Band4: 1.6-1.7 μm and SWIR_Band6: 2.18-2.22 μm):

Before extracting reflectance values of these bands, the ASTER image was orthorectified using the same procedure as RADARSAT-1 images. The atmospheric correction method 6S [35] was also applied on the image to get rid of unwanted atmospheric effects on the observed at-sensor radiance.

In order to bring the data to the same spatial resolution as the RADARSAT-1 images, the spectral bands of theses three indices (15 m and 30 m) were subsampled and projected in the same projection system as RADARSAT-1 data. An arithmetic integration method was used for merging the differing scale datasets. This method has demonstrated good color contrast compared to original imagery [36].

2.4. Methodological Framework Description

The schematic diagram of the framework adopted for soil drainage classification is illustrated in Figure 4. First, a supervised maximum likelihood classifier was applied on the ASTER image for classifying land use types (LUT). These LUT were (1) bare soils (N1), (2) bare soil with sparse vegetation (N2), (3) pasture and forage (P), (4) emerged crops, mainly annual crops (C), (5) forest (F), (6) wetland (H), and (7) water (W). Cluster masks were derived for each LUT using an opening morphological filter for limiting impurities in masks.

RADARSAT-1 power values and spectral indices (1), (2), and (3) were extracted under 20 m buffer zone for each soil profiles. These data were standardized to the same scale [37]: where is the target variable standardized from 0 to 1 range, and and are the physical minimum and maximum of z, respectively.

To derive an optimal subset of predictors for discriminant analysis model, a forward stepwise variable selection was applied on the standardized data. The Wilks’ lambda was used for this purpose. This statistic is used in multivariate analysis of variance (MANOVA) to test the differences between the means of identified groups of subjects on a combination of dependent variables. The initial model is defined by starting with the variable which allows the highest separation between soil drainage classes. The model is then extended by including further variables depending on the Wilk’s lambda criterion. It consists to select the one which minimizes the Wilk’s lambda of the model including the variable if its value still shows statistical significance. The corresponding statistic and the tolerance () are also calculated. A small (close to 0) value of Wilks’ lambda means that the groups are well separated, while conversely, a large (close to 1) value of Wilks’ lambda means that the groups are poorly separated.

The meaningful application of linear discriminant analysis requires that the data be normally distributed. The Shapiro-Wilk’s (SW-W) normality test was achieved with an SW-W ≥ 0.82 for all predictors. Thus, no transformation was made on data.

Using discriminant analysis approach for soil drainage mapping is not new [19, 38]. In this study, instead of using the canonical discriminant analysis, a linear discriminant analysis, that is, Fisher’s linear discriminant [39] was performed.

The approach consists in computing classification functions from selected data to accurately identify each soil drainage class. These classification functions determine the class to which each soil profile of the validation dataset most likely belongs. There are as many classification functions as soil drainage classes (5). Each function allowed the computation of classification scores for each soil profile within each class, by applying the formula: where the subscript denotes the respective class; the subscripts denote the variables; is a constant for the th clas; is the weight for the jth variable in the computation of the classification score for the th class; is the observed value for the respective soil profile for the jth variable; is the resultant classification score.

These linear functions were applied to each LUT (mask) with representing the most contributing image attributes selected in the model determined by the stepwise analysis. For example, with bands, there would be a set of 6 + 1 associated weights for each classification function. Therefore, with five drainage classes, there would be a matrix, W, of 35 weight values, and each pixel would be assigned to one of the five drainage classes by maximum likelihood estimation.

We also investigated an approach based on the decision tree classifiers for soil drainage classification. The main advantage of using this method is that it does not require any assumptions about the distribution of the data (as is the case for the linear discriminant analysis). The DTC technique has been successfully used for land cover classification using remote sensing data [40]. It is based on a hierarchical decision scheme, called a “tree.” The tree is composed of a root node (containing all data), a set of internal nodes (splits), and a set of terminal nodes (leaves). Each node (soil profile of the training dataset) of the decision tree structure corresponds to a binary decision that separates either one class or more from the remaining classes.

The importance score method [40] is used to evaluate the most contributors’ variables in DTC classification.

Let be the sum of the of the squared deviation at node for a split of t into and . The optimal split is the one that maximizes the R, where The measure of importance of each feature i (RADARSAT-1 images and spectral indices) is defined as: The normalized quantity (IS = importance score) is expressed as: The feature with score “0” is considered as the least important one and can be rejected from the regression tree. A score “100” is considered as the highest score, which indicates that the feature is the most significant one.

The main problem in the application of the decision algorithm is to find best classification tree which gives the simplest tree (in which all soil drainage classes appear) and provide the least classification error. In our case, a cross-validation method was used. It consisted of removing a random subset of 36% of the data (cross-validation dataset), building a tree using the 64% remaining data (subtraining dataset), and applying the decision tree to classify the cross-validation dataset. This procedure was repeated on ten cross-validation subsets, one at a time.

A classification tree was made for each land use type and applied on the features images under the clusters mask. These masks were merged to obtain a digital map of soil drainage classes for the whole watershed.

To assess classification accuracy, the traditional error matrix based on the overall accuracy (, which is the sum of diagonal matrix elements) and the Kappa coefficient were used. The Kappa coefficient () was computed as: where is the chance of agreement between the observed and classified categories, while and are proportions of samples that are observed and classified as category i, respectively. Following [41], the agreement between the classified and observed categories can be divided into five levels according to the value of κ: (1) κ = 0.0–0.2, slight agreement; (2) κ = 0.2–0.4, fair agreement; (3) κ = 0.4–0.6, moderate agreement; (4) κ = 0.6–0.8, substantial agreement; (5) κ = 0.8–1.0, almost perfect agreement.

Finally, the degree of association between the best digital soil drainage classes map (DTC) and the conventional soil drainage map was analysed. Guidance in the ecological application of similarity and dissimilarity or distance measures is provided by P. Legendre and L. Legendre [42]. In the context of image analysis, the application of similarity and dissimilarity measures is more complex and diverse. However, some general approaches in this field are provided by Di Gesù and Starovoitov [43]. Due to conceptual aspect of soil drainage map, we investigated the weighted Gower’s similarity coefficient model [42] for the association between the digital map derived from remote sensing and original map. This model is adapted as we used the proportion of drainage class or drainage probability (DPij) in soil map polygons. For each drainage class () DP is the area occupied by this class per the total area of given polygon j. For the conventional soil drainage, the proportion of each soil drainage class (DPrij) was based of the map unit composition model.

The difference between drainage probabilities was computed () between the digital soil drainage (DPi1) and the conventional soil drainage map (DPr2) taken as reference. This difference is then divided by the largest difference () found for a given soil drainage class. Since this ratio is a normalized distance, it is subtracted from 1 to transform it in a similarity index (): The final form of Gower’s coefficient is where is the Kronecker’s delta of ( when the information about DPj is missing for one or the other soil drainage class, when information is present for both drainage classes).

3. Results and Discussions

3.1. Relation between RADARSAT-1 Information and Soil Drainage Classes

The mean backscattering coefficient computed from the five RADARSAT-1 images under bare soil conditions (N1 + N2) shows difference responses to soil drainage classes (Figure 6). From the first image acquired on May 3rd, it is difficult to discriminate between D6 and D7, as well as between D4 and D5, probably due to the high soil moisture level (around 30%), near the saturation level. The magnitude of these differences was more significant for the two images acquired after May 27th (<2 dB), which reflects the natural drainage response, that is, its effect on soil moisture content (Figures 3 and 6). The May 27th image showed the greatest backscattering coefficient difference (3 dB) between extremes soil drainage classes (D3 and D7). Moreover, a gradient of backscattering coefficient as a function of soil drainage classes can be noticed on this date. The two RADARSAT-1 images acquired with the same SAR configuration (S2a; Table 2), that is, on the May 3rd and May 27th, showed the most differences in terms of mean backscattering coefficient between soil drainage classes. These observed differences being independent of the acquisition characteristics and, under bare soil conditions, could be associated to the variation of soil drainage status when the roughness surface was not change.

Under vegetative cover, it was more difficult to discriminate soil drainage classes (Figure 7) with the HH monopolarized RADARSAT-1 images. The scattering mechanism under these surface conditions are complex and commonly considered to be volume scattering from the branch and leaf layer, direct ground scattering, double-bounce scattering between stems and ground, or branches and ground. The mean backscattering coefficients of the land use type classified as vegetative areas are annual crop (C), pasture and forage (P), forests (F), and wetlands (H). Only the extremes soil drainage classes (D3 and D7) can be distinguish.

3.2. Land Use Type Classification

The overall LUT classification accuracy was 86% (Table 4) and the associated Kappa coefficient was 0.50. These accuracy results are judged acceptable considering that the ASTER image used for deriving the LUT classification model was acquired in early June 2005 when many land use types show rapid temporal evolution due to variable plant grow status, weed emergence intensity, and agricultural practices (ploughed, chisel, no-tilled, presence of crop residues, seeded, etc.).

The distribution of selected soil profiles () dataset according to LUT derived from a supervised maximum likelihood classification applied on the ASTER image attributes is shown in Table 5. A total of 25.4% of soil profiles were located under bare soil conditions (N1 + N2) which covered 81.7 km2 of the study area. About 74.6% of the study area (168 km2) was under various vegetative covers (C + P + F + H). This result emphasizes the diversification of agricultural practices in the watershed where mixed farm system (e.g., farms combining dairy and hog production) is frequently observed associated to a high diversity of crops produced for supporting animal production systems, including forage, cereal, soybean, and corn.

3.3. Variable Selection

A forward stepwise variable selection was applied to identify the best predictors to be used in soil drainage classification model using the DAC method (Table 6) and DTC method (Table 7) for each LUT. Under all LUT, the three spectral indices derived from ASTER (NDVI, NDII, and NDWI) contributed as much as the RADARSAT-1 image does. These indices are often the most significant contributors to the classification models (LUT = N1, N2, P, and F).

For bare soils (N1), soil color and moisture are the only soil properties explaining soil drainage class discrimination. Instead of being related to vegetation properties, the NDVI is also related to soil color or brightness [32]. Indeed, soil color is an important indicator of the drainage characteristics in the soil, even if the soil is not wet when it is being observed. Well-drained soils are well aerated and the iron compounds exist as oxidized, trivalent (ferric), red and yellow oxides, and hydroxides. In poorlydrained soils, iron exists mainly in the colorless, reduced, divalent (ferrous) form. Organic matter accumulates more in surface horizons of poorlydrained soils than in well-drained soils. As a result, poorlydrained soils usually have dark surface horizons because of high organic matter content and relatively little oxidized iron [24]. The NDVI was the most contributor for the DAC method (Wilks’ Lambda = 0.61; Table 6) and for the DTC method (importance score = 100%; Table 7). The second most significant variables on drainage class identification were the two RADARSAT-1 images acquired early on May 3rd and on May 10th for DAC and the two RADARSAT-1 images acquired on May 10th and on May 27th for DTC. The image acquired on May 10th has the lowest incidence angle (Table 2). Then it is more sensitive to soil moisture and contributes on both classification models.

For bare soils with sparse vegetation (N2), the NDII was the most significant variable in the DAC (Table 6) while it was the second one in the DCT model with 74% (Table 7), where the most significant is the RADARSAT-1 was acquired early on May 3rd. As expected, when soil component is dominant over vegetative cover, the contribution of dielectric properties expressed by RADARSAT-data and plant moisture content (NDII) extracted from the ASTER image were required in the classification models.

For vegetative areas, the NDVI was the most significant variable in the DAC and DTC methods except for recently emerged crop (C) areas. Under this land use type, three RADARSAT-1 configurations and the NDWI were required for soil drainage classes’ discrimination. This can be explained by the heterogeneity of this LUT in terms of vegetation cover density, plant composition, and growing status during the image acquisition periods. The RADASAT-1 data acquired on May 24th with the largest incidence angle (30–37°; Table 2) which was found more adequate for vegetation discrimination [27] is the second most significant variable in this land use type while it was the first significant for DTC model.

For wetland (H), the most significant variable for the two classification models was the RADARSAT-1 image acquired on June 10th.

The variation in land use type may lead to differences in the selection of the optimal subset of the remote sensing images to be used for discriminating soil drainage classes. This demonstrates that the quality of remote sensing data (physical properties of images and the acquisition time) has also an important impact on the classification models.

3.4. Soil Drainage Classification

Classification models (DAC and DTC) were built for each land use type with the selected raster images. The small difference in spatial resolution between spectral bands in visible and near-infrared (15 m) and RADARSAT-1 images (12.5 m) should has a less impact on final map compared to the short-wave infrared channels (30 m) which have low resolution. The arithmetic integration method was used for merging the differing scale datasets. This technique has been successfully used by [36] for merging digital aerial photography (1–5 m) with LANDSAT TM data (30 m). The final resolution maps derived from classification methods obtained after merging all land use type clusters is 1 : 1 250 scale which is more detailed than the conventional soil drainage map (1 : 40 000 scale).

The accuracy assessment for validation of classification models (DAC and DTC) was made with all the 1610 soil profiles available for this study (Table 8). The nonclassified soil profiles corresponds to missing soil drainage classes for a given land use type. The omission corresponds to ratio of number of missing data per the total number of classified profiles.

Without taking land use into consideration, the overall classification accuracies of soil drainage were 40% for DAC and 65% for DTC method. Only the poorly drained class (D6) was well classified with around 82% accuracy. These results were biased because of the predominance of D6 class which represents 51.7% of soil survey data.

Under bare soils (N1), using the DAC method, best classification results were obtained for two contrasting soil drainage classes D3 (well drained) with 64% and D6 (poorly drained) with 79% of accuracy. With the DTC method, only the moderately drained soil class D4 (54%) was the least misclassified. Under bare soil with sparse vegetation (N2), all existing drainage classes were relatively well classified for DTC method and for DAC method, the D3 and D6 (73%) classes were also well classified. Under vegetation land cover (P, C, and F) the overall classification accuracies were worse for DAC method. Therefore, good classification accuracies were observed for the recently emerged crops (89% for D3, 78% for D6, and 85% for D7) using DTC method. Under forest, only the imperfectly drained (D5) was classified with medium performance (69%) and under pasture and forage the drainage class D3 (56%) for DAC method. The best classification accuracies were obtained with the wetland (H) for the DTC method with a global accuracy of 85% and a Kappa coefficient of 0.74.

As function of the land use type, these classification results showed that the decision tree classifier (DTC) (Figure 7) is more accurate than the discriminant analysis classification (DAC) (Figure 8). The cross-tabulation between the two digital maps over the validation dataset showed only 43% of agreement. The drainage classes D7 and D6 were the best classified with 63% and 88%, respectively. The moderately drained class D4 has the lowest classification accuracy in two classification methods. This is probably due to the fact that the moderate soil drainage class is a combination of the moderately well and imperfectly poorly drained soil classes.

3.5. Similarity Analysis between Digital and Conventional Soil Drainage Maps

In order to quantify the similarity between the digital soil drainage map derived from the DTC classification model (which has been found to be the best digital soil drainage map), and the conventional soil map, a drainage probability map (13) was produced for each soil drainage class (conventional soil map (Figure 9(a)) and DTC method (Figure 9(b))). It is worth noting that in some soil polygons, the drainage probability of a given soil drainage class can have a “0” value in the conceptual model associated to the conventional soil map, whereas the DTC method gives a value superior to 0. In the same way, there is no soil polygon having a “1” value for drainage probability (100% probability of occurrence) with the DTC method. That is due to the more detailed and accurate information depicted by the digital soil drainage map derived with the DTC method which attributes the most probable drainage class to each pixel (12.5 m ×12.5 m) included in soil polygons of the conventional soil map. A more reliable estimation of soil drainage class proportion within each soil polygon is therefore proposed with the DTC method than using the conceptual model of conventional soil mapping procedure based on whole map statistics.

The map of the average weighted similarity (12) computed from each soil drainage class probability shows that both methods give relatively similar classification results (Figure 10). This map illustrates that 3% of the study area (66 km2) shows good agreement (≥) between the conventional soil drainage map and the digital map derived with the DTC method (Figure 10). Medium to high similarities () were observed for 83% (187 km2) of the study area. Soil polygons showing low similarities () between both methods represent only 14% of the study area as there are few polygons showing weak similarities (). It can be seen that digital soil drainage map derived with DTC method and remote sensing data remains in close agreement with conventional soil mapping results, although DTC method gives more precise soil drainage classification than conventional map as it indicates, where each soil drainage class is located within soil polygons. The DTC method can therefore be used in combination with remote sensing data and available soil legacy data for updating and upgrading soil maps.

4. Conclusions

Optical and remote sensing data provides useful and precise information on the spatial distribution of soil hydrological properties and biophysical characteristics of vegetation. This study has demonstrated that earth observation data can be effectively used for soil drainage mapping at regional scale (detailed soil survey). This paper presents an approach to predict soil drainage classes using selected raster data derived from RADARSAT-1 and spectral indices extracted from ASTER image. Two classification methods (DAC and DTC) were evaluated to derive digital soil drainage maps. Good classification accuracies were obtained with these two classification methods. However, the DTC method gives higher classification accuracy (29 to 92%) than the DAC method (33 to 79%). Good agreement has been observed between soil drainage maps derived from conventional soil map and digital soil drainage map from DTC method. This digital soil drainage map has been achieved on a pixel-base approach (90 m × 90 m) which is more precise than the published soil map (1 : 20 000 and 1 : 40 000 scale), where soil information is presented on a soil polygon basis using a conceptual model describing soil component distribution within soil map unit. Methods used for variable selection allowed us to explore the complex relationships between remote sensing data and soil drainage classes according to land use type and also to identify the most useful combination of bands that increases the classification accuracy. Thus, the DTC method, applied on optical and radar data, is potentially a very useful approach for realizing digital maps of soil drainage as well as other soil properties over similar conditions. It should be emphasized that these promising results have been obtained by an adequate merging method of the different dataset and scales and also by selecting a minimum and representative training dataset.

Further refinements such as upgrading accuracy, evaluating accuracy over broader areas, using a digital elevation model in these landscapes are ongoing. Works are also needed to determine the optimal size of training datasets. The potential of fully polarimetric RADARSAT-2 data that allow different polarimetric decompositions and high classification accuracies will also be studied.

Acknowledgments

The project was cofunded by the Canadian Space Agency and Agriculture and Agri-Food Canada through a Government-Related Initiatives Program (GRIP). The authors also would like to acknowledge Xiaoyuan Geng of Agri-Environmental Service Branch (AESB) and Dr. Andrew Davidson of Agriculture and Agri-food Canada for their language edition efforts and their constructive comments on the paper. They are grateful to anonymous reviewers for their constructive comments on the original version of this paper.