Research Article | Open Access
Bin Li, Kaili Cheng, Zhezhou Yu, "Histogram of Oriented Gradient Based Gist Feature for Building Recognition", Computational Intelligence and Neuroscience, vol. 2016, Article ID 6749325, 9 pages, 2016. https://doi.org/10.1155/2016/6749325
Histogram of Oriented Gradient Based Gist Feature for Building Recognition
We proposed a new method of gist feature extraction for building recognition and named the feature extracted by this method as the histogram of oriented gradient based gist (HOG-gist). The proposed method individually computes the normalized histograms of multiorientation gradients for the same image with four different scales. The traditional approach uses the Gabor filters with four angles and four different scales to extract orientation gist feature vectors from an image. Our method, in contrast, uses the normalized histogram of oriented gradient as orientation gist feature vectors of the same image. These HOG-based orientation gist vectors, combined with intensity and color gist feature vectors, are the proposed HOG-gist vectors. In general, the HOG-gist contains four multiorientation histograms (four orientation gist feature vectors), and its texture description ability is stronger than that of the traditional gist using Gabor filters with four angles. Experimental results using Sheffield Buildings Database verify the feasibility and effectiveness of the proposed HOG-gist.
Building recognition is becoming increasingly more interesting to researchers, since it can be applied to many real-world problems, such as robot vision or localization , mobile device navigation [2, 3], and building labeling in videos. However, building recognition is a challenging task because building images could be taken from different viewpoints under different lighting conditions or suffering from occlusion from billboard, trees, vehicles, or other buildings. The biggest difficulty for build recognition is to design a feature extraction algorithm that can accurately and completely describe building characteristics.
Interest points extracted by the Harris corner detector were applied to matching buildings in the world space for mobile device . Li and Shapiro  used the consistent line cluster for content-based image retrieval. Specifically, the color, orientation, and spatial features of line segments are exploited to group image into line clusters. The intracluster and intercluster relationships were used to recognize buildings in photographic images. Zhang and Košecká  proposed a hierarchical building recognition method that has two steps and bases on the localized color histograms. The first step uses localized color histograms, and in the second step the method refined matching SIFT descriptors. Fritz et al.  applied the “Informative Descriptor Approach” on SIFT features (i-SIFT descriptors) and proposed a robust building detection and recognition method.
In , Li and Allinson pointed out the following: all the mentioned building recognition algorithms have two drawbacks. () They are based on the detection of low-level features such as vanishing points and line segments. The representation of building characteristics is restricted, since these low-level features cannot reveal the truly semantic concepts of building images. () If these raw high-dimensional feature vectors were used for recognition which may cause large memory requirements, it would result in high computational cost. Li and Allinson proposed a new building recognition method to address these two drawbacks. Li and Allinson use the gist feature extraction approach proposed by Siagian and Itti to obtain gist features of building images. In Siagian and Itti’s gist extraction method , 34 feature maps are created by filtering of the original image in orientation channels, color channels, and intensity channels in multiple spatial scales. Each feature map is divided into a grid, and the mean values of each grid were calculated to produce 16 values for a gist vector. As a result, the original image is represented by a 544-dimension feature vector. To reduce computational costs and preserve discriminative information as much as possible, several manifold learning dimensionality reduction algorithms, such as principal component analysis (PCA) , locality preserving projections (LPP) , and linear discriminant analysis (LDA) , are used for dimensionality reduction before recognition. The gist feature extraction and dimensionality reduction-based building recognition method proposed by Li and Allinson has been proven to be more effective than those low-level feature methods [3–7]. Li et al.  proposed subregion’s multiscale gist feature (SM-gist) extraction method. The SM-gist divided building image into subregions, and gist vectors are extracted from these subregions individually. The interference of nonuniform illumination is mitigated by the SM-gist extraction method. Zhao et al.  proposed multiscale gist (MS-gist) feature for building recognition. The MS-gist features can be stable to capture the representation features of the building images with rotation, variant lighting conditions, and occlusions.
The gist feature extraction methods proposed by Siagian and Itti were originally used for the task of scene recognition, but building recognition is different from scene recognition. This is because there are many lines on the building surface. For building recognition tasks, the texture feature of buildings is more important than the color and intensity features. Siagian and Itti’s gist feature extraction method used Gabor filters with only four angles to extract the orientation information. So, the texture description ability of Siagian and Itti’s gist feature extraction method is not good. To improve the texture description ability of Siagian and Itti’s gist feature extraction method, we propose histogram of oriented gradient based gist (HOG-gist) feature extraction method. The histogram of oriented gradient (HOG) was first proposed by Dalal and Triggs . Due to the strong texture and shape description ability, the HOG can be used in human detection , face recognition [15, 16], image registration , and many other tasks [18–21]. Our proposed HOG-gist extraction method individually computes the normalized histograms of multiorientation gradients for the same image with four different scales. These normalized histograms of oriented gradients are orientation gist feature vectors of an image. These orientation gist vectors combined with intensity and color gist feature vectors proposed by the traditional method are the proposed HOG-gist vectors.
This paper is organized as follows: we give a briefly review of Siagian and Itti’s gist feature extraction method in Section 2; the histogram of oriented gradient based gist (HOG-gist) feature extraction method is proposed in Section 3; recognition performance on the Sheffield Buildings Database is detailed in Section 4, while Section 5 concludes the paper.
2. Gist Feature Extraction
The psychological research  has proven that human can grasp the “gist” of an image by glancing at it for just a few seconds. Siagian and Itti’s gist feature extraction method is aiming to simulate this ability of human beings. Siagian and Itti’s gist feature extraction method has two main steps: saliency feature map construction and gist feature extraction. Saliency feature maps are constructed based on low-level visual features, including the intensity channel, color channel, and orientation channel, which are extracted in parallel. Equation (1) is utilized to compute the intensity channel [8, 23]:
For the intensity channel and the color channel, five image Gaussian pyramids, , , , , and , with nine spatial scales, ranging from 1 : 1 (scale zero) to 1 : 256 (scale eight) in eight octaves, are created [8, 23], where . The intensity and color saliency feature maps can be obtained by applying the center-surround operation to these Gaussian pyramids. The center-surround operation defined by Siagian and Itti is as follows [8, 23]: a pixel at scale is the center, and the corresponding pixels at scale , where is the surround. From (3), we can get six intensity feature maps [8, 23], and twelve color feature maps [8, 23] are obtained by (4):where denotes the cross-scale difference between two images in a Gaussian pyramid.
In total, 34 saliency feature maps are computed: 6 for intensity, 12 for color, and 16 for orientation.
Each map is then divided into grid subregions, and then take the mean of each grid to produce 16 values for the 16-dimension gist feature vector. We can get 34 gist feature vectors from the 34 feature maps. The 34 gist feature vectors included 6 intensity gist feature vectors, 12 color gist feature vectors, and 16 orientation gist feature vectors. The combination of all the gist feature vectors is a 544-dimension feature vector. Therefore, each building image can be represented by this 544-dimension feature vector. Figure 1 shows the main progress of by Li and Allinson’s building recognition method . In Figure 1, Siagian and Itti’s gist feature extraction method is used to extract the gist features from building images. Then, dimensionality reduction algorithm is used to reduce the dimension of the original feature vectors from 544 to a much lower dimension before classification.
3. Histogram of Oriented Gradient Based Gist Feature (HOG-gist) Extraction
In this section, we will introduce in detail our histogram of oriented gradient based gist feature (HOG-gist) extraction method and our building recognition method.
3.1. Orientation Gist Feature Extraction
The orientation gist features can be extracted by the following five steps. This process is shown by Figure 2.
The process is as follows:(1)An image pyramid is created on the intensity channel (see (1)) with four spatial scales ranging from 1 : 1 (scale zero) to 1 : 8 (scale three) in four octaves. A histogram of oriented gradient will be computed in each scale of .(2)Use gradient filter with no smoothing [14, 15] to compute the horizontal and vertical gradient of .(3)Compute magnitude and angle of the gradient [14, 15]:(4)Compute a histogram with orientation bins in 0°–180°. Magnitude () whose angle belongs to the same bin will be added up as the value of this bin. The value of is determined according to experimental results in Section 4.1.(5)The histograms can be normalized by (Lowe-style clipped norm)  normalization method.
After the computation of all the histograms of oriented gradient in four scales, we can get four -dimension vectors which are the orientation gist feature vectors of HOG-gist; namely, the orientation gist feature vectors of the HOG-gist are these four histograms of oriented gradient.
3.2. Our Proposed Building Recognition Method
Figure 3 shows the building recognition method based on our HOG-gist. The orientation channel in Figure 3 refers to the extraction procession of orientation gist feature, which has been explained in detail in Figure 2 and Section 3.1.
In Figure 3, the color channel and intensity channel present the procession of extraction in color gist feature vector and intensity gist feature vector of the building Image. The above extraction methods of gist feature vector are as same as the traditional method shown in Figure 1. After the procession of the color channel and intensity channel, 6 intensity gist feature vectors and 12 color gist feature vectors have been obtained from the intensity channel and the color channel, respectively. Then, 6 intensity gist feature vectors, 12 color gist feature vectors, and 4 orientation gist feature vectors will be combined to our finally proposed HOG-gist. Each intensity gist feature vector and color gist feature vector are of 16-dimension vector. In addition, orientation gist feature vector is dimension. Therefore, HOG-gist equals a -dimension gist feature vector. The HOG-gist will reflect the characteristics of the original building image.
Then, dimensionality reduction algorithms (such as LPP , MFA , PCA , and NPE ) have been applied to HOG-gist feature vector for the feature vector with lower dimension. Finally, the feature vector of lower dimension will be classified via the classifiers, such as Nearest Neighbor Classifier (NN) , Support Vector Machine (SVM) , and BP-neural Network (BP) .
To evaluate the performance of HOG-gist, we carry out experiments on the Sheffield Buildings Database . The Sheffield Buildings Database contains 3192 building images of 40 buildings, and for each building the number of building images varies from 100 to 400. The size of these images is . Figure 4 shows sample images of the Sheffield Buildings Database. From Figure 4, we can see that buildings are taken from different viewpoints and images may be under different scaling and illumination conditions, and there are occlusion and rotation phenomena in some of the images.
The number of building images of each building is different, so we select the first 20 images from each building and form a subset which we name as D1. D1 consists of 40 buildings and 20 images for each building. So, D1 consists of 800 buildings in total.
In our experiments, D1 was partitioned into different sample collections. We let indicate that, for each building in D1, images were selected at random for training and the remaining images were employed for testing. For each , 50 random splits are generated and the final result of this is obtained by taking the mean of the 50 recognition accuracy values.
4.1. Experiments for Parameter Selection
In this subsection, we aim at choosing a proper parameter, , which is the number of orientation bins of histogram of oriented gradient for our HOG-gist. We compute a histogram with orientation bins in the interval (0°–180°). If the step length of an angle is , , , , and , will be 90 (), 60 (), 45 (), 36 (), and 30 (), respectively. Parameter selection experiments are conducted on , , and of the D1 subset, respectively (Figure 5). In this experiment, LPP  was used for dimensionality reduction; at the same time, classification is conducted based on the Nearest Neighbor Classifier (NN) . The mean recognition rate corresponding to each value of is shown in Figures 4(a)–4(c).
From Figures 4(a)–4(c), it can be seen that HOG-gist achieves the highest recognition rate when the value of is 60. As a result, we set the value of to 60 in the following experiments. Since there are 60 bins values in the histogram, the dimension of an orientation gist feature vector is 60. There are four 60-dimension orientation gist feature vectors extracted by the HOG-gist extraction method. The HOG-gist is a 528-dimensional gist feature vector, whose dimension is similar to the dimension of Siagian and Itti’s gist.
4.2. Building Recognition Using Different Dimensionality Reduction Algorithms
In this experiment, we evaluated the performance of our HOG-gist by comparing HOG-gist with Siagian and Itti’s gist. LPP , NPE , PCA , and MFA  are employed as the dimensionality reduction algorithm, respectively. Finally, classification is conducted based on the Nearest Neighbor Classifier (NN) . The mean accuracy values of Siagian and Itti’s gist and our HOG-gist are listed in line 1 and line 2 of each Table, respectively.
From the results shown in Tables 1–4, one can find the following:(1)With the increasing number of training samples, the mean recognition rates of the two gist feature extraction methods have risen differently.(2)Our HOG-gist shows a better performance than Siagian and Itti’s gist regardless of which kind of dimension reduction algorithm is selected.(3)In most situations, feature dimensions of our HOG-gist corresponding to the best recognition results are much lower than those of Siagian and Itti’s gist. This indicates that the texture and shape description ability of our HOG-gist are better than those of Siagian and Itti’s gist. Therefore, our HOG-gist feature can be reduced to a lower dimension. Then the higher recognition rate of HOG-gist is achieved.(4)MFA is a supervised subspace learning dimension reduction algorithm. The average recognition rate of HOG-gist combined with MFA is higher than that of HOG-gist combined with other dimension reduction algorithms, which is the same to Siagian and Itti’s gist feature.(5)As an unsupervised dimension reduction algorithm, the performance of NPE is satisfied. The mean recognition accuracy values of HOG-gist combined with NPE are only slightly lower than those of HOG-gist combined with MFA.
4.3. Building Recognition Using Different Classifiers
Building recognition was conducted by combining HOG-gist or traditional gist (Siagian and Itti’s gist) with different classifiers to compare the performances of HOG-gist and traditional gist in this experiment. LPP algorithm is the dimensionality reduction algorithm of HOG-gist. And then, the low-dimensional features after dimensionality reduction were classified individually by using four different classifiers: Nearest Neighbor Classifier (NN), SVM with the radial base kernel function, and BP-neural Network with two and three hidden layers. The two BP-neural Networks are denoted as BP1 and BP2 in Figure 6, respectively. Then, the above experiments were repeated for Siagian and Itti’s gist feature.
The mean recognition results are in Figure 6. In Figure 6, the solid line shows the result of HOG-gist combined with a certain classifier, and the dashed line in the same color is the result of Siagian and Itti’s gist combined with the same classifier. The horizontal axis of Figure 6 is the number of training samples, and the vertical axis represents the mean recognition accuracy corresponding to each number of training samples.
From Figure 4, we can make the following conclusions:(1)No matter which classifier combined with the HOG-gist, it has gained higher mean recognition rate than the traditional gist (Siagian and Itti’s gist) combined with the same classifier, which shows that the HOG-gist is superior to Siagian and Itti’s gist feature.(2)With SVM, the HOG-gist has achieved the highest mean recognition rate; the second highest recognition rate is with NN, and the lowest recognition rate is with BP. Siagian and Itti’s gist feature combined with the above classifier also got the same result.(3)The mean recognition rate of Siagian and Itti’s gist with SVM is higher than the recognition rate of HOG-gist with BP or NN, which shows that the selection of classifier is as important as the selection of feature extraction method.
There are a lot of lines on the building surface, so the texture feature of buildings is more important than the color feature and intensity feature for building recognition tasks. In order to improve the texture description ability of traditional gist feature extraction method, we proposed histogram of oriented gradient based gist (HOG-gist) feature extraction method. Our method employs the normalized histograms of oriented gradients as orientation gist feature vectors of an image. These orientation gist vectors combined with intensity and color gist feature vectors extracted by the traditional method are our HOG-gist. The HOG-gist contains four multiorientation histograms (four orientation gist feature vectors), and its texture description ability is stronger than that of the traditional gist using Gabor filters with four angles. The HOG-gist is a 528-dimensional gist feature vector, whose dimension is similar to the dimension of Siagian and Itti’s gist, but its mean recognition accuracy is better than the mean recognition accuracy of Siagian and Itti’s gist.
The authors declare that none of them have any competing interests in the manuscript, and there is no conflict of interests regarding the publication of this article.
This research is supported by () Doctoral Scientific Research Fund of Northeast Dianli University (BSJXM-201520), () Key Scientific and Technological Project of Science and Technology Department of Jilin Province of China (20150204007GX), and () National Natural Science Foundation of China (61602108).
- M. M. Ullah, A. Pronobis, B. Caputo, J. Luo, P. Jensfelt, and H. I. Christensen, “Towards robust place recognition for robot localization,” in Proceedings of the 2008 IEEE International Conference on Robotics and Automation (ICRA '08), pp. 530–537, IEEE, Pasadena, Calif, USA, May 2008.
- M. Belkin and P. Niyogi, “Laplacian eigenmaps and spectral techniques for embedding and clustering,” Advances in Neural Information Processing Systems, vol. 14, no. 6, pp. 585–591, 2002.
- R. Hutchings and W. Mayol-Cuevas, “Building recognition for mobile devices: incorporating positional information with visual features,” Tech. Rep. CSTR-06-017, Computer Science, University of Bristol, 2005.
- Y. Li and L. G. Shapiro, “Consistent line clusters for building recognition in CBIR,” in Proceedings of the 16th International Conference on Pattern Recognition, IEEE Computer Society, Quebec City, Canada, August 2002.
- W. Zhang and J. Košecká, “Hierarchical building recognition,” Image and Vision Computing, vol. 25, no. 5, pp. 704–716, 2007.
- G. Fritz, C. Seifert, M. Kumar, and L. Paletta, “Building detection from mobile imagery using informative SIFT descriptors,” in Image Analysis: 14th Scandinavian Conference, SCIA 2005, Joensuu, Finland, June 19–22, 2005. Proceedings, vol. 3540 of Lecture Notes in Computer Science, pp. 629–638, Springer, Berlin, Germany, 2005.
- J. Li and N. M. Allinson, “Subspace learning-based dimensionality reduction in building recognition,” Neurocomputing, vol. 73, no. 1–3, pp. 324–330, 2009.
- C. Siagian and L. Itti, “Rapid biologically-inspired scene classification using features shared with visual attention,” IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 29, no. 2, pp. 300–312, 2007.
- I. T. Jolliffe, Principal Component Analysis, Springer Series in Statistics, Springer, Berlin, Germany, 2002.
- X. He, “Locality preserving projections,” Advances in Neural Information Processing Systems, vol. 45, no. 1, pp. 186–197, 2010.
- W. S. Rayens, “Discriminant analysis and statistical pattern recognition,” Journal of the Royal Statistical Society, vol. 168, no. 3, pp. 635–636, 2005.
- B. Li, W. Pang, Y. Liu et al., “Building recognition on subregion's multiscale gist feature extraction and corresponding columns information based dimensionality reduction,” Journal of Applied Mathematics, vol. 2014, Article ID 898705, 10 pages, 2014.
- C. Zhao, C. Liu, and Z. Lai, “Multi-scale gist feature manifold for building recognition,” Neurocomputing, vol. 74, no. 17, pp. 2929–2940, 2011.
- N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '05), pp. 886–893, IEEE, San Diego, Calif, USA, June 2005.
- O. Déniz, G. Bueno, J. Salido, and F. De la Torre, “Face recognition using histograms of oriented gradients,” Pattern Recognition Letters, vol. 32, no. 12, pp. 1598–1603, 2011.
- C.-Y. Su and J.-F. Yang, “Histogram of gradient phases: a new local descriptor for face recognition,” IET Computer Vision, vol. 8, no. 6, pp. 556–567, 2014.
- S. Ghafurian, I. Hacihaliloglu, D. N. Metaxas, V. Tan, and K. Li, “3D/2D image registration using weighted histogram of gradient directions,” in Medical Imaging: Image-Guided Procedures, Robotic Interventions, and Modeling, vol. 9415 of Proceedings of SPIE, pp. 1–7, 2015.
- P. Torrione, K. D. Morton, R. Sakaguchi, and L. M. Collins, “Histogram of gradient features for buried threat detection in ground penetrating radar data,” in Proceedings of the 32nd IEEE International Geoscience and Remote Sensing Symposium (IGARSS '12), pp. 3182–3185, July 2012.
- M. Sharma and H. Ghosh, “Histogram of gradient magnitudes: a rotation invariant texture-descriptor,” in Proceedings of the IEEE International Conference on Image Processing (ICIP '15), pp. 4614–4618, Quebec City, Canada, September 2015.
- M. A. Syarif, Th. S. Ong, and C. Tee, “Fingerprint recognition based on multi-resolution histogram of gradient descriptors,” in The 8th International Conference on Robotic, Vision, Signal Processing & Power Applications: Innovation Excellence Towards Humanistic Technology, vol. 291 of Lecture Notes in Electrical Engineering, pp. 189–196, Springer, Berlin, Germany, 2014.
- S. Ahn, J. Park, and J. Chong, “Blurring image quality assessment method based on histogram of gradient,” in Proceedings of the 19th Brazilian Symposium on Multimedia and the Web (WebMedia '13), pp. 181–184, Salvador, Brazil, November 2013.
- A. M. Treisman and G. Gelade, “A feature-integration theory of attention,” Cognitive Psychology, vol. 12, no. 1, pp. 97–136, 1980.
- L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 11, pp. 1254–1259, 1998.
- S. Yan, D. Xu, B. Zhang, H.-J. Zhang, Q. Yang, and S. Lin, “Graph embedding and extensions: a general framework for dimensionality reduction,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 1, pp. 40–51, 2007.
- X. He, D. Cai, S. Yan, and H.-J. Zhang, “Neighborhood preserving embedding,” in Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV '05), pp. 1208–1213, IEEE Computer Society, Beijing, China, October 2005.
- Y. Freund, “Experiments with a new boosting algorithm,” in Proceedings of the 30th International Conference on Machine Learning, vol. 13, pp. 148–156, 1996.
- C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, 1995.
- G. E. Hinton and T. J. Sejnowski, “Learning and relearning in Boltzmann machines,” in Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1, pp. 45–76, MIT Press, Cambridge, Mass, USA, 1986.
Copyright © 2016 Bin Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.