Analysis and Applications of LocationAware Big Complex Network Data
View this Special IssueResearch Article  Open Access
Qi Zhu, Ning Yuan, Donghai Guan, "Cognitive Driven Multilayer SelfPaced Learning with Misclassified Samples", Complexity, vol. 2019, Article ID 8127869, 10 pages, 2019. https://doi.org/10.1155/2019/8127869
Cognitive Driven Multilayer SelfPaced Learning with Misclassified Samples
Abstract
In recent years, selfpaced learning (SPL) has attracted much attention due to its improvement to nonconvex optimization based machine learning algorithms. As a methodology introduced from human learning, SPL dynamically evaluates the learning difficulty of each sample and provides the weighted learning model against the negative effects from hardlearning samples. In this study, we proposed a cognitive driven SPL method, i.e., retrospective robust selfpaced learning (R2SPL), which is inspired by the following two issues in human learning process: the misclassified samples are more impressive in upcoming learning, and the model of the followup learning process based on large number of samples can be used to reduce the risk of poor generalization in initial learning phase. We simultaneously estimated the degrees of learningdifficulty and misclassified in each step of SPL and proposed a framework to construct multilevel SPL for improving the robustness of the initial learning phase of SPL. The proposed method can be viewed as a multilayer model and the output of the previous layer can guide constructing robust initialization model of the next layer. The experimental results show that the R2SPL outperforms the conventional selfpaced learning models in classification task.
1. Introduction
By assigning the samples in a meaningful learning order based on prior knowledge, curriculum learning (CL) [1] provides an easytohard learning process, which makes the model more fits human cognition. To make curriculum learning more practical in dealing with machine learning problems, Kumar et al. [2] adaptively assessed the sample learning difficulty in model and proposed selfpaced learning method. Specifically, selfpaced algorithm actively and dynamically obtains the initial learning sequence from the original data and gradually increases the hard learning samples during each iteration. However, in curriculum learning, the sample learning course sequence is preset. By predefining or dynamically generating learning sequence, curriculum learning and selfpaced learning can avoid main function falling into a bad local optimal solution. Many researchers applied curriculum learning and selfpaced learning to some tough pattern recognition problems. In the literature [3], Jiang et al. proposed the selfpaced curriculum learning, which not only obtains the dynamic sample sequence in the process of model learning, but also makes use of prior knowledge to avoid overfitting. Zhao et al. [4] applied the nonconvex problem of matrix decomposition, which suppresses effectiveness of the noise and outlier in the data on the model. Meanwhile, they pointed out that the strategy of adaptively selecting easylearning sample sequences is similar to the process of human cognition. James et al. [5] adopted selfpaced learning to SVM and achieved promising results in multimodal data retrieval. Selfpaced learning has been introduced to many learning models and shown good performance in many realworld applications.
Selfpaced learning seems to challenge the conventional learning methods, like active learning, boost, and transfer learning. In the view of machine learning, these boundary samples, noise samples, and outliers will increase the uncertainty of the model and may make the model generate a bad classification boundary. Therefore, compared to easylearning samples, hardlearning learning samples have drawn much attention from the conventional model. In our work, we aim to deal with supervised learning problems, in which easylearning samples correspond to samples with small loss while hardlearning samples correspond to sample with large loss. In unsupervised learning, easylearning samples mean the samples that are easy to be determined while hardlearning samples denote the samples that will cause the model to be unstable. In the paper, misclassified samples are denoted as the samples that the product of the predicted value and the label is negative. Typically, in AdaBoost learning, the model trains the classifier by changing the sample distribution based on the misclassified samples of previous iterations [6]. Li et al. [7] applied the sequence of AdaBoost to train classifiers, starting with weak learner and progressively boosted as a strong learner. Active learning is a kind of semisupervised learning, and it chooses to label the most valuable samples for the model. These lowconfidence samples that may contain useful information are difficult to be chosen, which requires additional expert knowledge to identify. Tur et al. [8] presented a spoken language understanding method by combining active and semisupervised learning with humanlabel and automatically labeled data. Huang et al. [9] proposed a systematic framework to simultaneously measure the informativeness and the representativeness of an instance. The informativeness criteria reflects the ability of samples in reducing the uncertainty of model based on the labeled data, while the representativeness measures which samples can well represent the unlabeled data. However, selfpaced learning model first considers the easylearning samples with small prediction loss and gradually adopts hardlearning samples with larger prediction loss to extend the training set. The difference between selfpaced learning and transfer learning is that the transfer learning improves the generalization of the model by sharing the models in different tasks [10], while the selfpaced learning updates and learns itself to obtain the local optimal solution.
Study [11] pointed out the inherent consistency between human recognition and reinforcement learning. In dealing with a learning problem, humans and other animals utilize a harmonious combination of repeating learning and hierarchical sensory processing systems. In selfpaced learning, the initial model is trained insufficiency with a few easylearning samples, which increases the learning risk of followup iteration and even reduces the generalization of the final model. The usual practice of solving the small sample problem contains feature selection [12], regularization, adding artificial samples [13], etc. In order to improve the generalization of the initial model consisting of small samples in selfpaced learning, we design the recurrent framework, which uses the model of last selfpaced learning iteration to repeatedly construct the initial model. Corresponding to the repeating learning process of humans, if the initial model inherits the property of large sample learning model, the obtained final model may be more robust and discriminative.
Meanwhile, although selfpaced learning and some conventional machine learning methods (AdaBoost, active learning and transfer learning) are very different in sample processing, we can still absorb the advantages of these conventional methods into selfpaced learning. Specifically, in this paper, we propose retrospective robust selfpaced learning (R2SPL). In each iteration of selfpaced learning, besides considering easylearning samples, these misclassified samples of last iteration will also be involved in training the model. For example, if the hardlearning samples (their categories are difficult to determine) in the data are the majority, conventional selfpaced learning may not get a good local optimal solution. In this case, our proposed method focuses on both easylearning samples and misclassified samples in each iteration, which can drive the final mature model be robust and discriminative.
Overall, our main contribution can be summarized as follows:(i)We introduce these misclassified samples accompanied with easy samples with small loss in each iteration to guide the model becomes more discriminative.(ii)Retrospective selfpaced learning is proposed to improve the robustness of the initialization of selfpaced learning.(iii)Experiments results show the proposed method achieves promising result in classification tasks.
The remainder of this paper is organized as follows. We briefly introduce related works on selfpaced learning in Section 2. We propose the robust SPL in Section 3. In Section 4, we conduct the experiments on UCI and ADNI datasets. We provide the conclusion and the future research plan in Section 5.
2. Related Work
2.1. Curriculum Learning and SelfPaced Learning
In 2009, Bengio et al. [1] proposed a method of imitating children education order which is called curriculum learning. Different from conventional machine learning methods obtained from overall sample learning, in their work, they sorted the samples in a meaningful order and learned the model in several sections. Benefiting from the prior knowledge, curriculum learning can get better results than other machine learning models in some tasks. However, arranging the sample order usually requires expert identification, which increases the difficulty and cost of the model. In addition, the ordered sample sequence is static and lacks flexibility in dealing with new samples or tasks. To alleviate this deficiency, Kumar et al. [2] proposed selfpaced learning in 2010. Without any prior knowledge and expert identification, selfpaced learning can dynamically assign the samples from easy to difficult based on the fitness between the samples and the model. In multimedia retrieval, Lu et al. [14] proposed selfpaced reranking model for multimodal data, and the model made significant progress on both image and video search tasks. Zhou et al. [15] brought the selfpaced learning to deep neural network, which can adaptively involve the faithful samples into training process. By analyzing the work mechanism of selfpaced learning, Fan et al. [16] proposed a general implicit regularized framework. Since selfpaced learning is adopted into many models, the commonality among these models lies in the sample processing. In each iteration, these models usually pick these highconfidence samples which fit the model better to construct the current model and gradually use the remaining lowconfidence samples to finetune the model to make it become more generalization.
Curriculum learning is the first attempt to combine human cognition sequence and machine learning model. Although curriculum learning has some drawbacks, it brings the idea of easytohard learning to the latter models. Selfpaced learning is the extension of curriculum learning, which is more flexible and concise. Similar to human learning, selfpaced learning trains samples from easy to difficult and gradually improves the robustness of the model.
2.2. Tough Samples Learning
In the sample processing strategy, selfpaced learning method is different with some tough samples focused learning methods, like AdaBoost, active learning, and transfer learning. In our work, we try to finely distinguish different types of samples, including easylearning samples, hardlearning samples, and misclassified samples, and give them different weights in model. By combining the simple classifiers, AdaBoost can deal with complicated problem. For example, in many multiclass problems [17, 18], the distribution of samples is highly complex [19]. Like SVM, AdaBoost can asymptotically achieve a margin distribution which is robust to noise [7, 20, 21]. Active learning is a semisupervised model that uses the unlabeled samples to improve the model obtained by labeled samples. However, since the unlabeled samples have no tags, some data that are difficult to distinguish the types usually need to manually annotate. Otherwise, if these data are identified by the model, it may increase the uncertainty of the model. Lin et al. [22] proposed active selfpaced learning that used the characteristic of these two models to automatically annotate the highconfidence and lowconfidence samples and incorporated them into training under weak expert recertification. Kumar et al. [2] pointed out that certainty does not imply correctness. Many researchers performed SVM and active learning in some practical applications, like text classification [23, 24], image retrieval [25], and segmentation of images [26]. The model will adjust the weight of the data from original domain, which increases the similarity of the data between target domain and source domain [10]. In the process of children learning, some problems share a common underlying structure but differ in surface manifestations, which is similar to the characteristics of transfer learning [27]. In order to make the models close to human wisdom, many researchers combine the models with environmental feedback and transfer learning [28–30].
In our work, we will focus on both the easylearning samples and the tough samples, which improve the discrimination of selfpaced learning. In each iteration, we will simultaneously select these easylearning samples and misclassified samples to train. Like human cognition, it is beneficial to improve the generalization of the final selfpaced model by simultaneously learning the highconfidence samples and lowconfidence samples in each iteration.
3. Proposed Method
3.1. Robust SPL
Specifically, we define a diagonal weight matrix to denote misclassified weight of each sample. Let and represent the label and predicted value of th sample, respectively. For binary classification problem, if , the th sample is corrected classified. Otherwise, this sample is considered as misclassified sample. In our work, the weight of these misclassified samples () in weight matrix should be larger than these corrected samples, and the scope of this type of weight should not vary greatly. Therefore, in our work, we adopt sigmoid function, shown in Figure 1, as weight function with respect to the product of label and predicted value. Given the label vector , data matrix , and current model parameter , the misclassified weight of th sample can be calculated asFor supervised problem, selfpaced learning function assigns weight to samples based on the sample loss. Those samples with small loss will be viewed as easylearning samples. However, our model simultaneously considers easylearning samples and tough samples in each iteration. Specifically, we combine and selfpaced weight matrix linearly. Then, the model can be formulated aswhere is the regularization term. To embed structure information in feature extraction, we adopt norm on the regression coefficient . The closer the value of gets to 0, the sparser the result of the feature extraction is. is the selfpaced weight function and controls the number of samples which is considered to construct the model. At first, only a few of samples with small loss can be utilized to construct the model. With the decrease of , more and more samples with larger loss will join the model training process.
Whatever the forms of selfpaced function are, they should satisfy three properties [3, 14]: is convex with respect to ; the sample weight should be monotonically decreasing with respect to its corresponding loss; the sample weight should be monotonically decreasing with respect to the pace parameter .
Meanwhile, in the process of human cognition, people usually make mistakes due to the lack of knowledge. By constantly summarizing unfamiliar and misunderstand concepts, people can form a more robust knowledge system. Notably, if the children get the help with adults (like teachers and parents) in the process of cognition, they can construct the knowledge framework more rapidly and soundly. The selfpaced learning is similar to the education process of children without the help of adults. Therefore, the learned initial model may be not robust enough. To alleviate this deficiency, in this paper, we proposed retrospective robust selfpaced learning. Specifically, we cascade multiple selfpaced learning algorithms, which can help to reduce the negative impacts in the initialization of followup selfpaced learning process due to lack of sample simple size problem. Naturally, in the next selfpaced learning process, the learning rate can be speeded up moderately. Repeating the process for several times, we can obtain more robust and discriminative model. The framework of the proposed method can be viewed as a multilayer network shown in Figure 2. Firstly, we construct the initial model based on these easylearning samples. The obtained initial model does not have good discriminability due to lack of sample training. Then, we adopt more hardlearning and misclassified samples to retrain our model, which drives our model to be more robust and discriminative. When the training of this layer is finished, the convergence results will be used as the prior knowledge of the next layer model. Repeat this operation until all nlayer models have been trained. Because selfpaced learning stage is essentially a layer of the network. Specifically, the output of the first layer can be used to guide choosing of the easylearning samples in the initialization of the second layer, which can be expected to be more robust than that learn independently.
3.2. Optimization
Since the parameters , , and are independent with each other, we can fix other parameters when we calculate each of them. In th iteration, each parameter can be calculated by the th iteration parameters.In th iteration, each portion is a convex problem; the optimal solutions of parameters can be achieved. The solutions of , , and are presented as follows.
(1) The Solution of To simplify the calculation, we convert the second term of (4) to . is a diagonal matrix and the diagonal elements can be calculated bywhere is the th element of . Then, (4) can be equivalently formulated asGet the derivation of in (6) and set it to 0:Then, the optimal solution of isIn the next iteration, the model will correct its mistake by guiding the regression coefficient based on the parameter . Under the influence of the accumulation of and , which corresponds to easy samples with small loss and misdirected samples, our proposed selfpaced learning model will be more robust and discriminative than conventional selfpaced learning models which only consider easy sample in each iteration.
(2) The Solution of In our work, we define aswhere , is used to describe the lower bound of sample loss, and is used to describe the upper bound. Meanwhile, also describes the age of the model. In the initial stage, only easy samples with small loss are considered to construct the model. As grows, more and more complicated samples with larger loss will be adopted to the model to make it more mature. The sample weight can be calculated by our selfpaced weight function . Get the derivation of in (9) and set it to 0:where is the squared loss of th sample. Then, the optimal solution of is given byAs mentioned above, we adopt retrospective selfpaced learning framework to increase the robustness and discrimination of model. Specifically, the step size of in the current selfpaced learning process is smaller than that in the followup process. In our work, we set the step size of first selfpaced learning layer is 0.1 and gradually increase it in followup process. To simplify the calculation, we set the number of layers to 3 in our method.
(3) The Solution of . The solution of can be calculated by (1). In our work, we apply sigmoid function to assign weight value to matrix . Using different selfpaced weight functions in (10), we can obtain different models. In detail, we adopt three selfpaced learning function, binary, linear, and logarithmic. can be formulated as follows.
(a) Binary
(b) Linear
(c) Logarithmicwhere . The solving algorithm of our model is shown in Algorithm 1.

4. Experiments
4.1. Settings
To evaluate the effectiveness of our proposed method, we conduct our experiment on ten binary classification datasets from UCI repository and Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. The detailed information of UCI datasets is presented in Table 1. AD data used in our experiment is obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). In our work, the Alzheimer’s Disease (AD) data have 913 samples with 116dimension, which are consisting of 160 AD patients, 542 MCI patients, and 211 healthy controls (HC). Specifically, the MCI patients can be divided into three stages, 82 Significant Memory Concern (SMC) patients, 273 Early Mild Cognitive Impairment (EMCI) patients, and 187 Late Mild Cognitive Impairment (LMCI) patients. There are five modalities in the ADNI data, including ID (serial number), single nucleotide polymorphism (SNPdata), voxel based morphometry (VBM), fluorodeoxyglucose position emission tomography (FDG), and F18 florbetapir PET scans amyloid imaging (AV45). In ADNI database, we perform three classification tasks, AD versus HC, MCI versus HC, and SMC versus LMCI. In each classification task, we compare our method with baselines SVM with RBF kernel, AdaBoost and conventional selfpaced methods. In the sample processing, AdaBoost adjusts the distribution of training samples based on the performance of basic learners, which makes the misdirected samples in current iteration get more attention in the next iteration.

For conventional selfpaced learning models whose weight functions are (13), (14), and (15), we define them as binary, linear, and log for short. Meanwhile, we construct two selfpaced learning models based on our proposed models. If the parameter is not considered into the retrospective model, we call the model as EasySPL for short. When our proposed model is just one level selfpace learning containing parameter , it can be defined as SingleSPL. To obtain unbiased results, we adopt 10fold crossvalidation strategy with four measurements, including classification accuracy (ACC), sensitivity (SEN), specificity (SPE), and area under receiver operating characteristic curve (AUC). In UCI databases, we repeat all experiments 30 times with 2folds crossvalidation. In the experiment, we analyze the results of each layer of selfpaced leaning process to determine the number of layers. We stop training the model when the convergence results of current selfpaced learning process are not significantly improved compared with the previous iteration process. Then, we can determine the number of layers in our model.
4.2. Experimental Results on UCI and ADNI Data
At first, we verify the effectiveness of introducing tough samples and retrospective selfpaced learning to the model in each iteration on ten UCI datasets. Figure 3 lists the results of each layer of the proposed selfpaced learning method. Obviously, after introducing weight matrix , the model behaves more discriminative in each iteration. Meanwhile, the model of last selfpaced learning process not only has better performance but also behaves more robust than the previous layer. Table 2 lists the ACC and AUC of seven baselines and our model. Our proposed model achieves all the best results on 10 UCI datasets and makes great improvement in several datasets.

We compared the proposed method, i.e., R2SPL, with several representative classification methods, including SVM, AdaBoost, SPL with binary, linear, or log function, SingleSPL (SinL), SPL without tough samples (NSPL). The results are shown in Figure 4. We draw the precisionrecall curves of these methods in Figure 5 and presented AUC and ACC results in Table 2. As seen from Figures 4 and 5 and Table 2, we find our methods outperform these comparison methods on all the ten datasets.
We also performed our method and comparison methods on ADNI dataset and conducted three classification tasks, AD versus HC, MCI versus HC, and SMC versus LMCI. The comparison results in three tasks, AD versus HC, MCI versus HC, and SMC versus LMCI, are listed in Tables 3, 4, and 5, respectively. Obviously, our proposed method has better performance compared with other methods in ACC, SEN, SPE, and AUC. It demonstrates the superiority of our model to other classifiers in AD classification problems.



4.3. Parameter Influence
Our model has two parameters including regularization term λ and sparse term p. We test the influence of the two parameters on 10 UCI datasets. The parameter λ is tuned from to and the value of sparse term p is adjusted from 0 to 2. When detecting the sensitivity of a parameter to the model, we only change the value of this parameter and fix the value of another parameter. Figure 6 shows the experimental results. Specifically, Figure 6 shows the influence of regularization term λ and parameter p. As we can see from Figure 6, when the λ is tuned from to , the performance of our proposed model is stable in most cases. Figure 6 also shows the influence of sparse term p. When p changes from 0.4 to 2, the performance of our proposed method changes slightly. We conduct multiple groups of experiment on 10 UCI datasets. The experiment results verify that our model is not sensitive to specific parameters and only related to the structure of the model.
4.4. The Convergence Results of Different Layers of Model
In the convergence analysis, we find that different models have different rates of convergence. The convergence results are listed in Figure 7; obviously, as the number of layers increases, the convergence speed of the model is also accelerating. Benefiting from the prior knowledge of previous iteration process, the current model can obtain local optimal solution faster.
5. Conclusion
In this paper, we divide the samples into easylearning samples, hard learning samples, and misclassified samples and analyze their roles in learning. Then, we introduce tough or misclassified sample in the training of each iteration to selfpaced learning. Meanwhile, considering the human cognition process, people usually need to constantly explore and learn from the same data or task to obtain a deep knowledge about it by multiple learning stages. So, we design the retrospective framework to improve the robust of selfpaced learning, which uses the model in previous layer to reduce the negative effect of small sample size problem in the initialization phase of next iteration. The experimental results show that the proposed method behaves more robust and discriminative than conventional selfpaced learning methods and many representative methods. In our further work, we will extend above framework to other learning tasks, such as semisupervised learning.
Data Availability
Raw data were generated at Nanjing University of Aeronautics and Astronautics. Derived data supporting the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported in part by National Natural Science Foundation of China (nos. 61501230, 61732006, 61876082, and 61861130366), National Science and Technology Major Project (no. 2018ZX10201002), and the Fundamental Research Funds for the Central Universities (no. NP2018104).
References
 Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum learning,” in Proceedings of the 26th International Conference On Machine Learning, ICML 2009, pp. 41–48, Canada, June 2009. View at: Google Scholar
 M. P. Kumar, B. Packer, and D. Koller, “Selfpaced learning for latent variable models,” Advances in Neural Information Processing Systems, pp. 1189–1197, 2010. View at: Google Scholar
 L. Jiang, D. Meng, Q. Zhao, S. Shan, and A. G. Hauptmann, “Selfpaced curriculum learning,” in Proceedings of the AAAI, vol. 2, p. 6, 2015. View at: Google Scholar
 Q. Zhao, D. Meng, L. Jiang, Q. Xie, Z. Xu, and A. G. Hauptmann, “Selfpaced learning for Matrix factorization,” in Proceedings of the 29th AAAI Conference on Artificial Intelligence, AAAI 2015 and the 27th Innovative Applications of Artificial Intelligence Conference, IAAI 2015, pp. 3196–3202, 2015. View at: Google Scholar
 J. S. Supancic III and D. Ramanan, “Selfpaced learning for longterm tracking,” in Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2013, pp. 2379–2386, June 2013. View at: Google Scholar
 C. Ying, M. QiGuang, L. JiaChen, and G. Lin, “Advance and prospects of AdaBoost algorithm,” Acta Automatica Sinica, vol. 39, no. 6, pp. 745–758, 2013. View at: Google Scholar
 X. Li, L. Wang, and E. Sung, “AdaBoost with SVMbased component classifiers,” Engineering Applications of Artificial Intelligence, vol. 21, no. 5, pp. 785–795, 2008. View at: Publisher Site  Google Scholar
 G. Tur, D. HakkaniTür, and R. E. Schapire, “Combining active and semisupervised learning for spoken language understanding,” Speech Communication, vol. 45, no. 2, pp. 171–186, 2005. View at: Publisher Site  Google Scholar
 S. Huang, R. Jin, and Z.H. Zhou, “Active learning by querying informative and representative examples,” Advances in neural information processing systems, pp. 892–900, 2010. View at: Publisher Site  Google Scholar
 S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345–1359, 2010. View at: Publisher Site  Google Scholar
 V. Mnih, K. Kavukcuoglu, D. Silver et al., “Humanlevel control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015. View at: Publisher Site  Google Scholar
 Z. Zhang, Y. Xu, J. Yang, X. Li, and D. Zhang, “A survey of sparse representation: algorithms and applications,” IEEE Access, vol. 3, pp. 490–530, 2015. View at: Publisher Site  Google Scholar
 D.C. Li, C.S. Wu, T.I. Tsai, and Y.S. Lina, “Using megatrenddiffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge,” Computers & Operations Research, vol. 34, no. 4, pp. 966–982, 2007. View at: Google Scholar
 L. Jiang, D. Meng, T. Mitamura, and A. G. Hauptmann, “Easy samples first: Selfpaced reranking for zeroexample multimedia search,” in Proceedings of the 2014 ACM Conference on Multimedia, MM 2014, pp. 547–556, USA, November 2014. View at: Google Scholar
 S. Zhou, J. Wang, D. Meng et al., “Deep selfpaced learning for person reidentification,” Pattern Recognition, vol. 76, pp. 739–751, 2018. View at: Publisher Site  Google Scholar
 Y. Fan, R. He, J. Liang, and B.G. Hu, “Selfpaced learning: an implicit regularization perspective,” in Proceedings of the AAAI, vol. 3, p. 4, 2017. View at: Google Scholar
 T. Hastie, S. Rosset, J. Zhu, and H. Zou, “Multiclass adaboost,” Statistics and Its Interface, vol. 2, no. 3, pp. 349–360, 2009. View at: Publisher Site  Google Scholar  MathSciNet
 F. Lv and R. Nevatia, “Recognition and segmentation of 3d human action using hmm and multiclass adaboost,” in Proceedings of the European Conference on Computer Vision, pp. 359–372, Springer, 2006. View at: Google Scholar
 P. Viola and M. Jones, “Fast and robust classification using asymmetric adaboost and a detector cascade,” Advances in Neural Information Processing Systems, pp. 1311–1318, 2002. View at: Google Scholar
 G. Rätsch, T. Onoda, and K. R. Müller, “Soft margins for AdaBoost,” Machine Learning, vol. 42, no. 3, pp. 287–320, 2001. View at: Publisher Site  Google Scholar
 J. H. Morra, Z. Tu, L. G. Apostolova, A. E. Green, A. W. Toga, and P. M. Thompson, “Comparison of adaboost and support vector machines for detecting alzheimer's disease through automated hippocampal segmentation,” IEEE Transactions on Medical Imaging, vol. 29, no. 1, pp. 30–43, 2010. View at: Publisher Site  Google Scholar
 L. Lin, K. Wang, D. Meng, W. Zuo, and L. Zhang, “Active SelfPaced Learning for CostEffective and Progressive Face Identification,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 1, pp. 7–19, 2018. View at: Publisher Site  Google Scholar
 S. Tong and D. Koller, “Support vector machine active learning with applications to text classification,” Journal of Machine Learning Research, vol. 2, pp. 45–66, 2001. View at: Google Scholar
 G. Schohn and D. Cohn, “Less is more: Active learning with support vector machines,” in Proceedings of the ICML, pp. 839–846, Citeseer, 2000. View at: Google Scholar
 S. Tong and E. Chang, “Support vector machine active learning for image retrieval,” in Proceedings of the 9th ACM International Conference on Multimedia, pp. 107–118, October 2001. View at: Google Scholar
 P. Mitra, B. U. Shankar, and S. K. Pal, “Segmentation of multispectral remote sensing images using active support vector machines,” Pattern Recognition Letters, vol. 25, no. 9, pp. 1067–1074, 2004. View at: Publisher Site  Google Scholar
 A. L. Brown and M. J. Kane, “Preschool children can learn to transfer: Learning to learn and learning from example,” Cognitive Psychology, vol. 20, no. 4, pp. 493–523, 1988. View at: Publisher Site  Google Scholar
 M. E. Taylor and P. Stone, “Transfer learning for reinforcement learning domains: a survey,” Journal of Machine Learning Research, vol. 10, pp. 1633–1685, 2009. View at: Google Scholar  MathSciNet
 H. Shin, H. R. Roth, M. Gao et al., “Deep convolutional neural networks for computeraided detection: CNN architectures, dataset characteristics and transfer learning,” IEEE Transactions on Medical Imaging, vol. 35, no. 5, pp. 1285–1298, 2016. View at: Publisher Site  Google Scholar
 O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, “Show and tell: A neural image caption generator,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, pp. 3156–3164, June 2015. View at: Google Scholar
Copyright
Copyright © 2019 Qi Zhu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.