Table of Contents Author Guidelines Submit a Manuscript
Computational Intelligence and Neuroscience
Volume 2016, Article ID 7386517, 13 pages
http://dx.doi.org/10.1155/2016/7386517
Research Article

Stratification-Based Outlier Detection over the Deep Web

1Department of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215002, China
2School of Computer Engineering, Suzhou Vocational University, Suzhou, Jiangsu 215104, China
3Computer Science Department, University of Central Arkansas, Conway, AR 72035, USA

Received 3 November 2015; Revised 15 March 2016; Accepted 6 April 2016

Academic Editor: Leonardo Franco

Copyright © 2016 Xuefeng Xian et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. B. He, M. Patel, Z. Zhang, and K. C.-C. Chang, “Accessing the deep web,” Communications of the ACM, vol. 50, no. 5, pp. 94–101, 2007. View at Publisher · View at Google Scholar · View at Scopus
  2. W. Wu, C. Yu, A. Doan, and W. Meng, “An interactive clustering-based approach to integrating source query interfaces on the deep web,” in Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD '04), pp. 95–106, ACM, Paris, France, June 2004. View at Scopus
  3. K. C.-C. Chang and J. Cho, “Accessing the web: from search to integration,” in Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD '06), pp. 804–805, ACM, June 2006. View at Publisher · View at Google Scholar · View at Scopus
  4. V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: a survey,” ACM Computing Surveys, vol. 41, no. 3, article 15, 2009. View at Publisher · View at Google Scholar · View at Scopus
  5. E. M. Knox and R. T. Ng, “Algorithms for mining distance-based outliers in large datasets,” in Proceedings of the 24th International Conference on Very Large Data Bases (VLDB '98), pp. 392–403, 1998.
  6. E. M. Knorr and R. T. Ng, “Finding intensional knowledge of distance-based outliers,” in Proceedings of the 25th International Conference on Very Large Data Bases (VLDB '99), pp. 211–222, Edinburgh, UK, September 1999.
  7. M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, “Lof: identifying density-based local outliers,” in Proceedings of the ACM International Conference on Management of Data (SIGMOD '00), vol. 29, no. 2 of ACM Sigmod Record, pp. 93–104, Dallas, Tex, USA, May 2000. View at Publisher · View at Google Scholar
  8. J. Tang, Z. Chen, A. W.-C. Fu, and D. W. Cheung, “Enhancing effectiveness of outlier detections for low density patterns,” in Advances in Knowledge Discovery and Data Mining, M.-S. Chen, P. S. Yu, and B. Liu, Eds., vol. 2336, pp. 535–548, Springer, Berlin, Germany, 2002. View at Publisher · View at Google Scholar
  9. Z. He, X. Xu, and S. Deng, “Discovering cluster-based local outliers,” Pattern Recognition Letters, vol. 24, no. 9, pp. 1641–1650, 2003. View at Publisher · View at Google Scholar
  10. C. Böhm, K. Haegler, N. S. Müller, and C. Plant, “CoCo: coding cost for parameter-free outlier detection,” in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '09), pp. 149–158, ACM, Paris, France, July 2009. View at Publisher · View at Google Scholar · View at Scopus
  11. A. Dasgupta, G. Das, and H. Mannila, “A random walk approach to sampling hidden databases,” in Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD '07), pp. 629–640, ACM, Beijing, China, June 2007. View at Publisher · View at Google Scholar · View at Scopus
  12. A. Dasgupta, X. Jin, B. Jewell, N. Zhang, and G. Das, “Unbiased estimation of size and other aggregates over hidden web databases,” in Proceedings of the International Conference on Management of Data (SIGMOD '10), pp. 855–866, Indianapolis, Ind, USA, June 2010. View at Publisher · View at Google Scholar · View at Scopus
  13. J. Rice, Mathematical Statistics and Data Analysis, Cengage Learning, 2006.
  14. P. Billingsley, “The lindeberg-levy theorem for martingales,” Proceedings of the American Mathematical Society, vol. 12, no. 1, pp. 788–792, 1961. View at Google Scholar · View at MathSciNet
  15. T. Dalenius, “The problem of optimum stratification,” Scandinavian Actuarial Journal, vol. 1950, no. 3-4, pp. 203–213, 1950. View at Google Scholar
  16. W. Liu, X.-F. Meng, and W.-Y. Meng, “A survey of deep web data integration,” Chinese Journal of Computers, vol. 30, no. 9, pp. 1475–1489, 2007. View at Google Scholar · View at Scopus
  17. B. He, Z. Zhang, and K. C.-C. Chang, “Knocking the door to the deep web: integrating web query interfaces,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 913–914, ACM, Paris, France, June 2004.
  18. J. Madhavan, S. Jeffery, S. Cohen et al., “Web-scale data integration: you can only afford to pay as you go,” in Proceedings of the 3rd Biennial Conference on Innovative Data Systems Research (CIDR '07), Asilomar, Calif, USA, January 2007.
  19. S. Thirumuruganathan, N. Zhang, and G. Das, “Breaking the top-k barrier of hidden web databases?” in Proceedings of the IEEE 29th International Conference on Data Engineering (ICDE '13), pp. 1045–1056, IEEE, Brisbane, Australia, April 2013. View at Publisher · View at Google Scholar · View at Scopus
  20. J. Madhavan, D. Ko, Ł. Kot, V. Ganapathy, A. Rasmussen, and A. Halevy, “Google's deep web crawl,” Proceedings of the VLDB Endowment, vol. 1, no. 2, pp. 1241–1252, 2008. View at Publisher · View at Google Scholar
  21. J. Lu, Y. Wang, J. Liang, J. Chen, and J. Liu, “An approach to deep web crawling by sampling,” in Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT '08), vol. 1, pp. 718–724, IEEE, December 2008.
  22. M. Zhang, N. Zhang, and G. Das, “Mining a search engine's corpus: efficient yet unbiased sampling and aggregate estimation,” in Proceedings of the ACM International Conference on Management of Data (SIGMOD '111), pp. 793–804, Athens, Greece, June 2011. View at Publisher · View at Google Scholar
  23. T. Liu, F. Wang, J. Zhu, and G. Agrawal, “Differential analysis on deep web data sources,” in Proceedings of the 10th IEEE International Conference on Data Mining Workshops (ICDMW '10), pp. 33–40, Sydney, Australia, December 2010. View at Publisher · View at Google Scholar · View at Scopus
  24. T. Liu, F. Wang, and G. Agrawal, “Stratified sampling for data mining on the deep web,” Frontiers of Computer Science, vol. 6, no. 2, pp. 179–196, 2012. View at Google Scholar
  25. M. Wu and C. Jermaine, “Outlier detection by sampling with accuracy guarantees,” in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '06), pp. 767–772, Philadelphia, Pa, USA, August 2006. View at Publisher · View at Google Scholar
  26. G. Kollios, D. Gunopulos, N. Koudas, and S. Berchtold, “Efficient biased sampling for approximate clustering and outlier detection in large data sets,” IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 5, pp. 1170–1187, 2003. View at Publisher · View at Google Scholar · View at Scopus
  27. N. Abe, B. Zadrozny, and J. Langford, “Outlier detection by active learning,” in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 504–509, ACM, Philadelphia, Pa, USA, 2006.
  28. T. Liu and G. Agrawal, “Stratification based hierarchical clustering over a deep web data source,” in Proceedings of the SIAM International Conference on Data Mining (SDM '12), pp. 70–81, SIAM, April 2012.
  29. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002. View at Publisher · View at Google Scholar
  30. C. Chen, A. Liaw, and L. Breiman, Using Random Forest to Learn Imbalanced Data, University of California, Berkeley, Calif, USA, 2004.
  31. X.-Y. Liu, J. Wu, and Z.-H. Zhou, “Exploratory undersampling for class-imbalance learning,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 39, no. 2, pp. 539–550, 2009. View at Publisher · View at Google Scholar · View at Scopus