Automatic Image Annotation Based on Particle Swarm Optimization and Support Vector Clustering
With the progress of network technology, there are more and more digital images of the internet. But most images are not semantically marked, which makes it difficult to retrieve and use. In this paper, a new algorithm is proposed to automatically annotate images based on particle swarm optimization (PSO) and support vector clustering (SVC). The algorithm includes two stages: firstly, PSO algorithm is used to optimize SVC; secondly, the trained SVC algorithm is used to annotate the image automatically. In the experiment, three datasets are used to evaluate the algorithm, and the results show the effectiveness of the algorithm.
With the popularization of digital cameras and other digital devices, the number of images of the network has increased exponentially , and image retrieval technology has become a hot research topic. According to the different retrieval methods, image retrieval technology can be divided into two categories: text-based image retrieval (TBIR); content-based image retrieval (CBIR) [2–4]. The advantage of TBIR is convenient, and users can query and get the relevant results by searching the relevant keywords. However it requires manual annotation of images, the workload is very massive. CBIR searches similar images based on the visual characteristics of images. Although there are many works about CBIR [5–10], the semantic gap still exists because the images are annotated based on their low-level features such as color and texture. Many studies combine semantic information to improve content-based image retrieval techniques, and semantic information is usually composed of textual keywords that describe the semantic attributes of images. Because manually annotating semantic information is a very time-consuming and laborious work, automatic image annotation has become an increasingly crucial problem in image retrieval [11–19].
Because of the semantic gap, a gap between the low-level visual features such as colors and textures and the high-level concept which are usually used by the user in searching process, the accuracy of CBIR is not adequate. This paper presents a novel automatic image annotation algorithm (namely PSVC) based on particle swarm optimization (PSO) and support vector clustering (SVC). Instead of SMO, PSO is used to optimize SVC in PSVC, which can get support vectors without setting the parameters’ value. Comparing with SMO, PSO can get less and better support vectors. SVC is used to model the data uniformly, so as to describe the image cluster containing the same semantics with the unified model. The problem of the semantic gap is further solved by different models describing the image clusters with different semantics. Experiment results show the effectiveness of the algorithm.
2. Related Work
This work is related to use the support vector machine in automatic annotation images and some related works are reviewed for this section.
Cusano et al.  annotated images using SVM. All the images are handled through taking for every pixel a constant number of partially overlapping image subdivisions which include it. Then, each of them is classified by a multi-class SVM. Each pixel is assigned to one of the categories using the results.
A confidence-based dynamic ensemble which was based on SVM was proposed to multiclass image annotation .
Under Multiple-Instance Learning (MIL) framework, image annotation was formulated as a supervised problem . Yang et al. proposed a new algorithm, ASVM-MIL (Asymmetrical Support Vector Machine-based MIL). For false positive and false negative examples, they introduced asymmetrical loss functions to extend the conventional SVM to MIL setting.
For automatic image annotation, Qi and Han  integrated two sets of SVM, namely the MIL-based and global-feature-based SVM. The MIL-based bag features can be got through employing MIL on the image blocks, where a faster searching algorithm and enhanced diversity density algorithm are used to improve the accuracy and the efficiency. The results are further input to a set of SVM to find the optimum hyperplanes for annotating training images. Homogeneously, color features and global texture are input another set of SVM to categorize training images. For each test image, there are two sets of image features and they are respectively input to the two sets of SVM. Through an automatic weight estimation algorithm, the outputs are incorporated and images can be annotated by the final results.
Verma and Jawahar  proposed an SVM-VT (SVM with Variable Tolerance) model to annotate images. An automatically determined tolerance-parameter is introduced to the SVM-VT model, which make the SVM-VT model more tolerant of the mistakes which maybe happen in annotating images.
Based on a generative model, DMBRM (Discrete Multiple Bernoulli Relevance Model) and a discriminative model SVM, Murthy et al.  proposed a hybrid model for annotating the image. The DMBRM model is used to solve the issue of data imbalance and the SVM is used to solve the poor annotation issue.
Sahbi  proposed an image annotation algorithm which is based on training SVM using a class of kernels which are designed through minimizing objective functions combining visual features and contextual cues from surrounding tags.
Gao et al.  presented an automatic image annotation algorithm by incorporating feature hierarchy and boosting to scale up SVM image classifiers which are trained in high-dimensional feature space. The visual features which are high-dimensional, multi-modal and heterogeneous are partitioned into multiple low-dimensional single-modal homogeneous feature subsets. Each of them characterizes certain visual property of images.
Sahbi and Li  proposed a new image annotation method which was based on SVM and a novel class of kernels. The kernels get better values when both share the same context and the same visual content.
Using both the related semantic concepts and the prevailing image components, Fan et al.  presented an image annotation method. The salient objects are used to represent the images’ content. For detecting the salient objects automatically, the SVM classifier is used to learn detection functions. For getting the semantic concepts, finite mixture models are employed to approximate the class distributions of the related salient objects. The method parameters and optimal model structure are synchronously determined by an EM algorithm.
For scalable image annotation, Alham et al.  proposed a resource aware parallel multiclass SVM algorithm (named RAMSMO) which divided the training dataset into smaller binary chunks. A cluster of computers in parallel is used to optimize support vector machines’ training.
Although there are some works about automatic image annotation with SVM, there are few works about automatic image annotation with SVC. Because of the semantic gap, the accuracy of the automatic image recognition is not high. Therefore, the PSVC algorithm is proposed to improve the accuracy of automatic image marking.
3. Automatic Image Annotation Based on PSVC
Firstly, the training process of SVC for multi-class is introduced. Secondly, PSO is used to optimize SVC. Thirdly, the trained SVC is used to annotate images automatically.
3.1. Training Process of Support Vector Clustering for Multi-Class
Support vector clustering can be divided into single class support vector clustering and multi-class support vector clustering. Single support vector clustering can classify data samples into two categories, but most of them are multi-class problems. A multi-class support vector clustering is needed to solve the related problems. Assuming classes, it is necessary to create support vector machines, which allow one of the classes to be separated from the rest of the classes, where the class is “” and the rest classes are “”, and so on. Voting to the resulting sample and the highest score is marked as a cluster.
The training process of SVC for multi-class is as follows: (Figure 1):(1)Initial training data: ; initial mark: ;(2)Using and to train support vector clustering and getting ; if , terminating the iteration, otherwise to (3);(3)Using and to train support vector clustering and getting ;(4)If , terminating the iteration, otherwise , , to (3).
The training process is relatively simple and the convergence speed is faster. In the training process of SVC, the tag allocation does not appear in the clustering algorithm, so the training time is significantly reduced.
3.2. Optimizing SVC Based on PSO
The most common method of training support vector clustering is the sequential minimal optimization (SMO)  to get the Lagrangian multiplier. The parameter combination plays a decisive role, where is the penalty parameter and is the relaxation factor. The value of the penalty parameter directly affects the experience error and the complexity of the learning ability. The relaxation factor controls the number of support vectors in the support vector clustering so as to control the sparse points. However, the parameters are often set by experience, which brings uncertainty to the results. Instead of SMO, PSO is used to train SVC in this paper, which can get support vectors without setting the parameters’ value.
According to the support vector clustering model, the Lagrangian multipliers, (), need to be found firstly. The adaptive function of the particle is:
The initial position of the particle should meet the following constraints:
Because of the local optimal value constraint, the efficiency of the PSO algorithm is reduced and the convergence speed of the particle is severely affected. In this paper, SVC is trained by dynamically changing the weight parameters to accelerate the convergence of the PSO algorithm. The main steps based on PSO to optimize SVC are as follows.
For each particle, there is a velocity information and location information , and the total number of particle is . Let be the best position of the particle, and be the global optimal position known to the particle:(i)Initializing all particles, for each particle, :(a)Uniform random variables generate the random number. And , represent the upper and lower bounds of the search space respectively. The initial iteration number is set to , and the maximum number of iterations is 1000;(b)Initializing the location information of the particle .(c)If , updating th particle’s best location information ;(d)Initializing th particle’s velocity information .(ii)If Meeting the iteration number or finding the global optimal solution, stopping the calculation, otherwise repeating the following steps: To each particle :(a)To each dimension ,(1)Selecting a random number , ;(2)Updating particle’s velocity: .(3)Determining the velocity in the bound or not.(b)To each dimension, updating particle’s position: ;(c)If ,(1)Updating particle’s best position ;(2)If , updating particle swarm’s best position: ;(iii)According to step (ii), finding the global optimization ;(iv)The global optimal solution is the Lagrangian multiplier, (), in the support vector clustering, and then the decision function is calculated, and all the number and value of the support vector can be obtained.The formula can be expressed in the following form:
Particle swarm optimization algorithm is to find the optimal solution in the randomly formed particles by continuous iteration. For support vector clustering in solving the problem of quadratic programming, the first difficulty is the problems of linear constraint and inequality constraint, which encounters problems such as a large amount of time-consuming and memory in the process of operation. By using the particle swarm optimization, the learning ability of support vector clustering is improved, and the trained model is used to automatic image annotation.
3.3. Process of Automatic Image Annotation
3.3.1. Extracting Images’ Features Based on CPAM
Colored pattern appearance model (CPAM) not only can extract the visual features around the salient point, but also contains the visual features of the whole picture , so it is more suitable for the requirement of image semantic annotation. CPAM is used to compress the image and extract the characteristics of the image, including the visualization of color and texture. Based on these, support vector clustering related information is generated. The final probability estimation method is used to mark the given image through the support vector description model. The eigenvalues extracted by CPAM contain statistical color and achromatic spatial patterns, which represent colors and textures. Through CPAM, the images are split into blocks, each containing achromatic spatial pattern histograms (ASPH) and Chromatic spatial pattern histograms (CSPH). The eigenvectors of one image extracted by CPAM contain 128 eigenvalues, of which 64 are color eigenvalues and 64 are achromatic pattern eigenvalues. Thus, each image is represented by a 128-dimensional vector.
3.3.2. Training Single Support Vector Clustering
As shown in Figure 2, a process of training support single vector clustering, the training data contains image clustering, each clustering contains images. Each clustering is manually marked with a set of semantics, , to describe the class, where the semantic word is a subset of all the thesaurus, , . An example of semantic word annotation is as Box 1.
For each training image, eigenvalues based on color features and texture features are extracted using CPAM. During the training process, these eigenvalues are mapped to the high-dimensional feature space by non-linear transformation, looking for all minimum enclosing balls, and each of them is a separate class.
3.3.3. Clustering Density Estimation
Assuming a data set , its probability distribution is . The proximity value is made by using a weighted kernel probability density estimate :where is a window function and is the th window’s weight, . Without loss of generality, Gaussian function is used:
In a single support vector clustering model, the enclosing ball is satisfied in the high dimensional space by using the trained kernel radius equation. Radius of the ball , test data , let be outside the ball, then: Because of , then:Similarly, if test data is inside or on the sphere, then:According to formulas (5), (6) and , then:The probability density follows the following formula:
So the following relationship is drawn:(1)If the input data is outside the sphere then ;(2)Similarly, if the input data is located on the surface or inside of the sphere, then .
The trained kernel-radius function of the support vector clustering is used to achieve the distribution of the clusters by using a probability density estimate. Given the test data , the probability density gradually increases as the test data approaches the center of the sphere.
3.3.4. Automatic Image Annotation
The automatic image annotation system consists of a single support vector clustering model, which is based on feature information such as color and texture. The support vector clustering model has sub-models; each model has been marked with the relevant semantic word. If the test image is given, the CPAM model is used to extract the feature information of the image. Then, the probability density of the test image is calculated by a single support vector clustering model. The formula is as follows: where is the test image’s eigenvalue extracted by CPAM model. Setting the semantics, , which describe a class. To each , the probability of establishes a correlation with the test image through the probability density. The set of all semantic words is , and calculating semantic words set which contains the semantic word . Then the semantic word containing can be used to generate a probability density value by testing the image. The formula is as follows:
Creating an -dimensional instruction vector that sets the initial value of all support vectors to zero. Then are stored in descending order, and the highest instruction vector value is set to maximum. And so on, an instruction vector is created for the test image at each step. The semantic word with the maximum instruction vector value is the final decision value, and the semantic word is selected. Similarly, the same method is used for other semantic words. Setting a threshold that represents the number of annotation words for each test image, and usually the threshold is four. The process of automatic image annotation is as Figure 3.
4. Experimental Results
The experimental environment is showed in Table 1, and the programming environment is Matlab and Visual C++ 6.0. Core 160 k data set is used as experimental data, which contains 60,000 images, and the image is JPEG format, specifications for or . The data set is divided into 600 clusters, each with 100 images, each cluster of images belonging to the same theme. Each cluster is manually marked with group descriptive words which are a description of the whole cluster, rather than a single image. The total number of words is 441, then the average number of annotations for each cluster is 3.6. In addition to the core 160 dataset, Google 6000 data sets containing 6000 images downloaded from the Internet by Google are used to test PSVC algorithms.
4.1. Feature Extraction
All 600 clusters (each cluster containing 100 images) are extracted feature using the CPAM method . The extracted eigenvalues are saved to one file. In the file, 0 value is deleted and 0 to 599 indicates different clusters. In Box 2, The first data “1” indicates the cluster label; the second data “1:” indicates the first dimension eigenvector of an image, and the “0.0076497” after the colon indicates the specific eigenvector value, and so on.
4.2. Training Data Sets Using PSO Algorithm
The extracted eigenvalues are used as inputs to the PSO algorithm, and 600 clusters are experimented repeatedly to determine the parameters. The parameter values of PSO are showed in Table 2.
Because the main factors affecting the PSO algorithm are the number of iterations, five clusters are randomly selected to reflect the relationship between the number of iterations and the convergence. As shown in Figure 4, horizontal axis is the number of iterations and vertical axis is convergence. When iteration is about 1000 times, five clusters are convergence. From Figure 4, the convergence of different cluster may be different. A cluster with fast convergence is easy to automatically annotate.
PSO algorithm is mainly used to get the Lagrangian multiplier in the support vector clustering training and then find the support vector. With supporting vector, the keywords of the test image are extracted by the probability calculation. The resulting Lagrangian multipliers are saved to coef 0~599 files. A file via notepad is opened, as showing in Box 3.
In Box 3, the data sample has five support vectors, and the Lagrangian multiplier is the first element of each row. A file containing only the Lagrangian multiplier is opened with a notepad, as showing in Box 4.
The Libsvm software  using the SMO algorithm is used to train the data to get the Lagrangian multiplier and the support vector. The Libsvm software parameters are showed in Table 3, give the remaining parameters for the default value. The results of the PSO algorithm and the SMO algorithm are showed in Table 4.
600 clusters are annotated respectively using PSVC algorithm and libsvm algorithm, and randomly selected five clusters for analysis. In Figure 5, the horizontal axis represents a random five clusters, and the vertical axis represents the accuracy which means an image can be successfully marked. From the experimental results, the effect of PSO algorithm is better than SMO algorithm for clustering of 1, 31, 45 and 133. But for the 24th cluster, the SMO algorithm is slightly better than the PSO algorithm. In general, using PSO algorithm in the automatic image annotation is better than using SMO algorithm.
4.3. Performance Comparision with Other Algorithms
The PSVC algorithm is compared with the other algorithms using Core 160 k, including SVC using SMO, TagProp model , TGLM model , Group Sparsity model , GPR , CS . Three measures “average precision” (P), “average recall” (R) and “1-Score” are used to evaluate the algorithms and the results are showed in Table 5.
From Table 5, it can be seen that the precision of PSVC has been greatly improved, which proves the model’s feasibility. Although the recall is lower than GPR algorithm, it has reached a high level and the comprehensive index 1-Score is the highest in Table 5.
Using SIFT and CPAM to extract features separately, PSVC is used to annotate images of the ImageNet dataset which are popular dataset in image annotation. The results are showed in Table 6. From Table 6, it can be seen that SIFT and CPAM are both effective feature extraction methods.
4.4. Annotating Google 6000 Data Sets Using PSVC
For better evaluating our algorithm, PSVC algorithm is used to Google 6000 data sets downloaded from the Internet. As showing in Table 7, the test image is successfully marked.
5. Conclusion and Future Work
With the progress of network technology, there are more and more digital images of the internet. To facilitate the management of these massive digital images, semantic annotation to the image is needed. In this paper, the PSVC algorithm which combines with PSO algorithm and SVC algorithm is proposed to annotate the images automatically. At the next stage, image-based search that can be used for multiple areas based on image auto-tagging will be performed.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
N. Vasconcelos and M. Kunt, “Content-based retrieval from image databases:current solutions and future directions,” in Inprocedings of the 2001 international Conference on Image Processing, vol. 3, pp. 6–9.View at: Google Scholar
K. Barnard, P. Duygulu, and D. Forsyth, “Clustering art[C],” in Inproceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 434–439.View at: Google Scholar
Q. Zhang, S. A. Goldman, and W. Yu, “Content-based image retrieval using multiple-instance learning,” in Inproceedings of the 19th International Conference on Machine Learning, pp. 682–689, 2002.View at: Google Scholar
C. Yang, M. Dong, and J. Hua, “Region-based image annotation using asymmetrical support vector machine-based multiple-instance learning,” in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2006, pp. 2057–2063, usa, June 2006.View at: Publisher Site | Google Scholar
H. Sahbi, “CNRS-TELECOM Paris Tech at Image CLEF 2013 Scalable Concept Image Annotation Task: Winning Annotations with Context Dependent SVMs,” in CLEF 2013 Evaluation Labs and Workshop, Online Working Notes, pp. 23–26, 2013.View at: Google Scholar
H. Sahbi and X. Li, “Context based support vector machines for interconnected image annotation,” in In the Asian Conference on Computer Vision (ACCV), 2010.View at: Google Scholar
J. Fan, Y. Gao, H. Luo, and G. Xu, “Automatic image annotation by using concept-sensitive salient objects for image content representation,” in Inproceedings of the 27th Annual International Conference on Research and Development in Information Retrieval, pp. 361–368, 2004.View at: Google Scholar
C. Platt J, Fast Training of Support Vector Machines Using Sequential Minimal Optimization, MIT Press, 1999.
P. Hall and A. Turlach B, “Reducing bias in curve estimation by use of weights,” Computational Statistics & Data Analysis, vol. 30, no. 1, pp. 67–86, 1999.View at: Google Scholar
A. Abdiansah and R. Wardoyo, “Time complexity analysis of Support Vector Machines (SVM) in LibSVM,” International Journal of Computer Applications, vol. 128, no. 3, pp. 975–8887, 2015.View at: Google Scholar
L. Denoyer and P. Gallinari, “A Ranking based model for automatic image annotation in a social network,” in Inproceedings of the Fourth International AAAI Conference on Weblogs and Social, pp. 231–234, 2016.View at: Google Scholar
M. Guillaumin, T. Mensink, J. Verbeek, and C. Schmid, “TagProp: discriminative metric learning in nearest neighbor models for image auto-annotation,” in Proceedings of the IEEE 12th International Conference on Computer Vision (ICCV '09), pp. 309–316, IEEE, Kyoto, Japan, September-October 2009.View at: Publisher Site | Google Scholar
S. Kharkate K and J. Janwe P N, “A novel approach for automatic image annotation using color saliency,” International Journal of Innovative Research in Computer & Communication Engineering, vol. 1, no. 5, pp. 1142–1148, 2013.View at: Google Scholar