Machine Vision SensorsView this Special Issue
Multisource Data Fusion Framework for Land Use/Land Cover Classification Using Machine Vision
Data fusion is a powerful tool for the merging of multiple sources of information to produce a better output as compared to individual source. This study describes the data fusion of five land use/cover types, that is, bare land, fertile cultivated land, desert rangeland, green pasture, and Sutlej basin river land derived from remote sensing. A novel framework for multispectral and texture feature based data fusion is designed to identify the land use/land cover data types correctly. Multispectral data is obtained using a multispectral radiometer, while digital camera is used for image dataset. It has been observed that each image contained 229 texture features, while 30 optimized texture features data for each image has been obtained by joining together three features selection techniques, that is, Fisher, Probability of Error plus Average Correlation, and Mutual Information. This 30-optimized-texture-feature dataset is merged with five-spectral-feature dataset to build the fused dataset. A comparison is performed among texture, multispectral, and fused dataset using machine vision classifiers. It has been observed that fused dataset outperformed individually both datasets. The overall accuracy acquired using multilayer perceptron for texture data, multispectral data, and fused data was 96.67%, 97.60%, and 99.60%, respectively.
The conventional methodologies are present to measure and monitor the land use/land cover (LU/LC) for regional and global environment changes . The real-time LU/LC data is very important for resource management, future prediction, crop growth assessment, and sustainable development . Although conventionally LU/LC data is collected through field base survey, remote sensing data collection has its own importance due to time, accuracy, and transparency factors and so forth. During the last decade, space-borne multispectral data have proved more beneficial over ground and airborne data for land monitoring, assessment, and accurate information due to their increased spectral resolution. Previously single source dataset is mostly used for LU/LC classification but recently multisource dataset is used for better overall accuracy results. Land cover is a primary factor that plays an important role for physical and chemical variation in environment. The change in LU/LC can be accurately identified by monitoring the regional and global classification maps continuously. When remote sensing data is used along with ground truth data then it provides reliable and cost-effective LU/LC information. Remote sensing mostly used the synthetic aperture radar (SAR) data for LU/LC information but cloudy weather is one of the major obstacles to acquire the information through optical imagery. It has been strengthened the significance of new tools and techniques for acquiring LU/LC thematic information from remote sensing data . In recent years, satellite-based remote sensing data have been very hot research area for earth scientists. Many researchers have worked on combining the spectral and optical data, which enhanced discrimination power of integrated data and their overall classification accuracy results [4, 5], and described the simple fused model for land cover classification which is named fused mixture model. The spatial and temporal adaptive-reflectance fusion model (STARFM) was proposed by , which gave the better accuracy results. For earth observation applications, remotely accessed sensor base multispectral data provides better large-scale information as compared to optical data . The fusion techniques enhance the operational capabilities of dataset with respect to other tuning factors and overall accuracy results . In data fusion, two or more datasets are merged together to acquire one single dataset with the entire dataset features individually . The low resolution multispectral dataset is fused with high resolution optical radar dataset to get the better results in terms of spatial resolution and overall classification accuracy . Huang with his companion described that LU/LC is the coarse dataset in spatial resolution and changes frequently when observing through remote sensing and it is very difficult to measure and monitor the change accurately . Different image fusion techniques with their constraints in implementation stages are discussed by [12, 13]. They proved quantitatively that fusion plays important role in better interoperational capabilities and reduces the ambiguity linked with the data acquired by different sensors or by same sensor with temporal variation. Quartz rich and poor mineral types are identified by using the image fusion method with the implementation of supervised classifier maximum likelihood (ML) and acquired overall accuracy and kappa coefficient of 96.53% and 0.95, respectively .
In this study, it has been tried to design a framework for analyzing the potential of multispectral dataset fused with texture feature dataset for the discrimination of different LU/LC classes.
2. Study Area
This study explains the data fusion technique for LU/LC classification instead of traditional ground base field surveys. All the experimentations have been performed at Islamia University of Bahawalpur Punjab province (Pakistan) located at 29°23′44′′N and 71°41′1′′E. This study describes the LU/LC monitoring, management, and classification using fused dataset generated by the combination of photographic and multispectral radiometric data, which is mostly bare and deserted rangeland. It would provide accurate results for LU/LC cover changes and prediction for better crop yield assessment.
For this study, multispectral dataset is obtained by using the device named Multispectral Radiometer Crop Scan (MSR5). It gives data which is equivalent to the Satellite Landsat TM (Thematic Mapper) . It has five spectral bands, that is, blue (B), green (G), red (R), near infrared (NIR), and shortwave infrared (SWIR) ranges from 450 nanometers to 1750 nanometers, while digital photographic data are acquired by a high resolution digital NIKON Coolpix camera having 10.1-megapixel resolution.
4. Material and Methods
The objective of this study is to analyze the five types of LU/LC multispectral data with the digital photographic data. A multisource data fusion frame work is designed to classify the subjective LU/LC type’s data accurately. Different image processing techniques have been applied on photographic data, that is, color to gray scale conversion, enhanced contrast, and image sharpening procedure. A still camera is mounted at 4-feet height stand and acquired five types of LU/LC images dataset. For image dataset, 20 images of each LU/LC with the dimension of pixels with 24-bit depth of jpg format have been acquired. To increase the size of image dataset, 5 nonoverlapping region of interests (ROIs) with different pixels size, that is, (), (), (), (), and (), have been taken on each image with the dimension of () and a total of 100 () images of above discussed sizes have been developed for each land type and a dataset containing 500 images on five types of LU/LC has been developed for experimentations. Similarly, for multispectral dataset, five spectral bands data are acquired and each band comprises visible ranges, that is, B, G, and R, from 400 nanometers to 700 nanometers, invisible bands near infrared (NIR) ranging from 750 nanometers to 900 nanometers, and shortwave infrared ranges from 1350 nanometers to 1650 nanometers. The multispectral dataset are acquired using a device multispectral radiometer (MSR5) serial number 566. For (MSR5) dataset, it has been observed that 100 scans of each of the LU/LC types and a total of 500 scans are acquired on the same location where digital images have been acquired. To avoid sun shadow effect, whole data gathering process has been completed at noon time (1.00 pm to 2.30 pm) under clear sky.
Experimentation. This study is unique because there is no need for any special laboratory setup. For image dataset, prior to further processing, different sizes of images have been converted from color to gray scale images (8 bits) and stored in bitmap (.bmp) format because MaZda software works better to calculate texture features in this format. The contrast level of grayscale images has been enhanced by using the image converter software. Now image dataset has been ready to calculate the first-order histogram and second-order texture parameters. MaZda software has been used to calculate 9 histogram features and 11 second-order texture features (Haralick) using gray level cooccurrence matrix (GLCM) in four dimensions, that is, 0°, 45°, 90°, and 135° up to 5-pixel distance and calculated 220 () texture features with 9 histogram features and 229 features in total for each ROI. It has been observed that total 114500 () features space for whole image dataset have been calculated . It is important to be mentioned here that it is not so easy to handle this large-scale feature space that is why three feature selection techniques, namely, Fisher (F), Probability of Error plus Average Correlation Coefficient (PE + AC), and Mutual Information (MI), have been employed to extract optimized features dataset. These three techniques have been merged together (F + PA + MI) and extracted thirty most discriminant features (10 features by each technique) out of 229 features space for each (ROI) image dataset. All the experimentations have been performed using MaZda software version 4.6 with Weka data mining tool version (3.6.12) on Intel® Core i3 processor 2.4 gigahertz (GHz) with a 64-bit operating system .
Proposed Methodology. The proposed methodology has been described in Figure 1. First data fusion algorithm has been described with all procedural steps.
Data Fusion Algorithm Start main Input Multispectral and Photographic land use/Land cover dataset For Step to Step Step = Photographic and multispectral datasets five land types. Step = Data Preprocessing Step = Developed co-occurrence matrix for photographic dataset and extract texture features Step = Multispectral dataset with five spectral bands visible and invisible wavelength Step = Three feature selection techniques, fisher (F), probability of error plus average correlation (POE + AC) and mutual information (MI) are merged (F + PA + MI) and employed on photographic dataset. Step = Extract 30 optimized texture features dataset Step = 30 optimized texture features + 5 spectral features fused dataset End For } Step = Machine vision classifiers are employed on fused dataset Output = Land classification Results End main }Now Figure 1 describes the proposed methodology in detail. At first step, two different types of datasets are acquired, that is, image dataset and multispectral dataset. The second step employs different image preprocessing filters, that is, Sobel or Laplacian, to sharpen the images and extract first-order and 2nd-order texture features. In step three, optimized features dataset has been acquired by implementing three combined feature selection techniques (F + PA + MI) and 30 most discriminant features are extracted. These 30 optimized texture features are shown in Table 1. It has been observed in Table 2 that the Mutual Information (MI) based selected texture features are very much correlated, namely, “inverse difference moment,” but these features have variation in interpixel distance and dimension and, due to this variation, their computed values are also different. For every pixel distance () and angular dimension (θ), the different calculated values are acquired for this texture feature which is “inverse difference moment.” For this study, we have taken and 5-pixel distance with angle dimension θ = 0°, 45°, 90°, and 135°. So, as a result, we cannot ignore any value of the given texture features. MI based texture features values actually describe the LU/LC dataset into its own direction and as whole these features disclose the entire texture patterns. It has been discussed by many researchers [10–14] that five control features, that is, window size, texture derivative(s), input channel (i.e., spectral channel to measure the texture), quantization level of output channel (8 bits, 16 bits, and 32 bits), and the spatial components, that is, interpixel distance and angle during cooccurrence matrix computation, play a very important role during the analysis of texture features.
In the fourth step, these 30 texture features are combined with 5 multispectral datasets and a fused dataset is developed with the combination of two different sources of data .
In the last step, this fused dataset has been deployed to different machine vision classifiers, that is, artificial neural network (MLP), Naïve Bayes (NB), Random Forest (RF), and J48; here j48 is the implementation of C4.5 algorithm of decision tree in Weka software. Figure 1 describes the multisource data fusion framework for LU/LC classification.
5. Results and Discussion
It has been observed that, as discussed above for image dataset, four ROIs with different pixel sizes, that is, , , , and , do not give satisfactory results for classification. The overall classification accuracy of less than 75% has been observed by implementing the MLP, NB, j48, and RF classifiers on the basis of these four ROIs which have not been acceptable, while, on ROI , the promising results for image data classification are provided. Finally to generate the fused dataset, the ROI of size has been merged with multispectral dataset. For classification, different machine vision classifiers have been employed on this fused dataset using Weka software version (3.6.12), that is, Multilayer Perceptron (MLP), Naïve Bayes (NB), Random Forest (RF), and J48 . These machine vision classifiers are employed on optimized fused dataset. Before deploying the fused dataset on Weka software, it has been converted into the Attribute Relation File Format (ARFF). This fused dataset has also been compared to both individual texture and multispectral dataset. These machine vision approaches have the potential to analyze the fused dataset. For this fused dataset, it has been separated into 66% for training and 34% for testing with 10-fold cross-validation method and same strategy also has been implemented for individual datasets, namely, multispectral data and texture. Besides this, quite a few other performance evaluating factors, that is, mean absolute error (MAE), root mean squared error (RMSE), confusion matrix, true positive (TP), false positive (FP), receiver-operating characteristic (ROC), time complexity (), and overall accuracy (OA), have also been calculated. At first, the fused dataset for LU/LC classification has been employed with different machine vision classifiers, namely, MLP, RF, NB, and J48 with an optimized set of 35 features that have shown different accuracy results. The overall accuracy with different performance oriented parameters are shown in Table 3.
Table 4 represents a confusion matrix of fused dataset; it includes the information which is extracted by deploying the MLP classifier and diagonal of table shows the maximal values which are placed in five different LU/LC classes. MLP shows the best overall accuracy among different employed classifiers.
Fused dataset LU/LC classification graph of MLP is shown in Figure 2. This shows that each type of dataset has 100 data instances (ROIs) and these ROIs or data have been classified into their five classes. Graphically data have been classified into five LU/LC classes, that is, “blue color” for fused dataset, “green color” for texture, and “red color” for multispectral dataset. Figure 2 explained the LU/LC data classification in MLP graph. Similarly, for texture and multispectral dataset, the same classifiers with same strategy have been employed as discussed in the above fused dataset. For texture dataset, 30 optimized texture features  have been deployed while, for multispectral dataset, 5 spectral features have been individually implemented. It has been observed that, for both texture and multispectral dataset, MLP classifier has shown the higher overall accuracy as compared to the others deploying classifiers. As a result, the deployed MLP classifier showed the higher overall accuracy with others performance evaluating parameters including kappa coefficient, TP, FP, ROC, MAE, RMSE, and time complexity factors [21, 22]. The overall accuracy with different performance evaluating parameters is shown in Table 5.
Table 6 represents a confusion matrix for texture dataset; it contains the information which is actual and predicted data for MLP classifier. MLP shows the best overall accuracy among different employed classifiers. Texture dataset LU/LC classification graph of MLP is shown in Figure 2. This shows that each type of dataset has 100 data instances (ROIs) and these ROIs or data have been classified into their five classes.
It contains the information which is actual and predicted data for MLP classification system. MLP shows the best overall accuracy among different employed classifiers for multispectral datasets. MLP confusion table for multispectral dataset is shown in Table 8.
Multispectral LU/LC dataset classification graph of MLP is shown in Figure 2. This shows that each type of dataset has 100 data instances or (ROIs) and these ROIs or data have been classified into their five classes. Figure 2 explained the LU/LC data classification in MLP graph.
Finally, a comparative LU/LC classification graph of fused, multispectral, and texture dataset using MLP classifier is shown in Figure 3. This shows that each type of dataset has 100 data instances (ROIs) and these ROIs or data have been classified into their five classes. The classification graph for MLP classifier is shown in Figure 3. It has been observed that fused dataset has relatively better overall accuracy as compared to multispectral and texture dataset . It shows that data fusion plays a vital role for better land assessment, management, and accurate monitoring purposes [25, 26].
This study is focused on the classification of five different types of LU/LC datasets. Four data mining classifiers, that is, MLP, RF, NB, and J48, have been employed on fused, texture, and multispectral dataset. These three types of dataset (fused, texture, and multispectral) have been examined for overall accuracy in classification with some other performance evaluating factors as discussed above in Results and Discussion. All the classifiers have shown satisfactory results, but multilayer perceptron (MLP) result was considerably better among all of them. It has been observed that, after deploying MLP, an overall accuracy of 96.67% for texture data, 97.60% for multispectral data, and 99.60 for fused dataset has been observed. Fused dataset has shown better overall accuracy among all types of dataset. It has been observed that final classification results of three datasets are not differing too much but other performance evaluating factors, that is, kappa statistics, RMSE, TP, FP, MAE, and execution time also play an important role for analysis. It is worth mentioning here that photographic data (texture data) is the visual data and its visual frequency ranges from 400 nm to 700 nm which has classification accuracy of 96.67% while multispectral data include the visual plus nonvisual data (IR and SWIR) and nonvisual frequency ranges from 750 nm to 1650 nm and attained classification accuracy of 97.60%, while fused dataset which integrated both types of data, that is, multispectral and statistical texture, acquired better overall accuracy which is 99.60% as compared to multispectral and texture dataset. Finally, it is observed that as dataset features values have been increased, the overall accuracy results have also been observed better and this shows that multisource data integration significantly improves the analysis and classification of LU/LC types and the employed classification framework is a powerful tool to generate reliable, comprehensive, and accurate results for LU/LC classification. In addition, it has been observed that this method can be used for decision-making, future prediction, and quick and accurate analysis of land use and land cover, when employing sophisticated rules on multisource LU/LC datasets. In future, the effect of variation in light intensity with incident light angle will be verified.
Conflicts of Interest
The authors declare no conflicts of interest.
The main concept and experimental work were performed by Salman Qadri and Dr. Dost Muhammad Khan made critical revisions and approved the final version. All other coauthors reviewed and approved the final paper.
The authors highly acknowledge the Department of Computer Science & IT, Islamia University of Bahawalpur, Pakistan, for providing all experimental facilities and convenient environment to accomplish this study and they especially thank Mr. Muzammil ur Rehman, Lecturer at DCS & IT, Islamia University of Bahawalpur, Pakistan, and Mr. Dell Nantt, CROPSCAN Corporation, Minnesota, USA, for their technical support.
L. C. Chen, T.-A. Teo, Y.-C. Shao, Y.-C. Lai, and J.-Y. Rau, “Fusion of LIDAR data and optical imagery for building modeling,” in Proceedings of the International Archives of Photogrammetry and Remote Sensing, vol. 35, pp. 732–737, 2004.View at: Google Scholar
M. S. Shifa, M. S. Naweed, M. Omar, M. Zeeshan Jhandir, and T. Ahmed, “Classification of cotton and sugarcane plants on the basis of their spectral behavior,” Pakistan Journal of Botany, vol. 43, no. 4, pp. 2119–2125, 2011.View at: Google Scholar
S. A. M. Rodrigues, Motivations, Experiences And Potential Impacts of Visitors to a Monastery in New Zealand: A Case Study, University of Waikato, 2012.
R. G. Congalton and K. Green, Assessing the accuracy of remotely sensed data: principles and practices, CRC press, 2008.