Table of Contents Author Guidelines Submit a Manuscript
Scientific Programming
Volume 2017, Article ID 3787053, 15 pages
https://doi.org/10.1155/2017/3787053
Research Article

Clustering Classes in Packages for Program Comprehension

1School of Information Engineering, Yangzhou University, Yangzhou, China
2State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
3School of Computer Science and Engineering, Southeast University, Nanjing, China
4School of Information Systems, Singapore Management University, Singapore
5Nanjing University of Information Science & Technology, Nanjing, China

Correspondence should be addressed to Bin Li; nc.ude.uzy@bl

Received 16 October 2016; Revised 13 February 2017; Accepted 27 February 2017; Published 11 April 2017

Academic Editor: Xuanhua Shi

Copyright © 2017 Xiaobing Sun et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. X. Peng, Z. Xing, X. Tan, Y. Yu, and W. Zhao, “Improving feature location using structural similarity and iterative graph mapping,” Journal of Systems and Software, vol. 86, no. 3, pp. 664–676, 2013. View at Publisher · View at Google Scholar · View at Scopus
  2. J. Wang, X. Peng, Z. Xing, and W. Zhao, “Improving feature location practice with multi-faceted interactive exploration,” in Proceedings of the 35th International Conference on Software Engineering (ICSE '13), pp. 762–771, IEEE, San Francisco, Calif, USA, May 2013. View at Publisher · View at Google Scholar · View at Scopus
  3. M. P. Obrien, “Software comprehension: a review and research direction,” Tech. Rep., 2003. View at Google Scholar
  4. Z. Fu, K. Ren, J. Shu, X. Sun, and F. Huang, “Enabling personalized search over encrypted outsourced data with efficiency improvement,” IEEE Transactions on Parallel and Distributed Systems, vol. 27, no. 9, pp. 2546–2559, 2016. View at Publisher · View at Google Scholar
  5. E. Soloway and K. Ehrlich, “Empirical studies of programming knowledge,” IEEE Transactions on Software Engineering, vol. 10, no. 5, pp. 595–609, 1984. View at Publisher · View at Google Scholar · View at Scopus
  6. W. Maalej, R. Tiarks, T. Roehm, and R. Koschke, “On the comprehension of program comprehension,” ACM Transactions on Software Engineering and Methodology, vol. 23, no. 4, article 31, 2014. View at Publisher · View at Google Scholar · View at Scopus
  7. P. Andritsos and V. Tzerpos, “Information-theoretic software clustering,” IEEE Transactions on Software Engineering, vol. 31, no. 2, pp. 150–165, 2005. View at Publisher · View at Google Scholar · View at Scopus
  8. S. Mancoridis, B. S. Mitchell, C. Rorres, Y. Chen, and E. R. Gansner, “Using automatic clustering to produce high-level system organizations of source code,” in Proceedings of the 6th International Workshop on Program Comprehension (IWPC '98), p. 45, Ischia, Italy, June 1998.
  9. N. Anquetil and T. Lethbridge, “Experiments with clustering as a software remodularization method,” in Proceedings of the 6th Working Conference on Reverse Engineering (WCRE '99), pp. 235–255, IEEE, October 1999. View at Publisher · View at Google Scholar · View at Scopus
  10. V. Rajlich and N. Wilde, “The role of concepts in program comprehension,” in Proceedings of the 10th International Workshop on Program Comprehension (IWPC '02), pp. 271–278, IEEE, Paris, France, June 2002. View at Publisher · View at Google Scholar · View at Scopus
  11. Z. Zhou, Y. Wang, Q. M. Wu, C. Yang, and X. Sun, “Effective and efficient global context verification for image copy detection,” IEEE Transactions on Information Forensics and Security, vol. 12, no. 1, pp. 48–63, 2017. View at Publisher · View at Google Scholar
  12. A. Kuhn, S. Ducasse, and T. Gîrba, “Semantic clustering: identifying topics in source code,” Information & Software Technology, vol. 49, no. 3, pp. 230–243, 2007. View at Publisher · View at Google Scholar · View at Scopus
  13. S. C. Choi and W. Scacchi, “Extracting and restructuring the design of large systems,” IEEE Software, vol. 7, no. 1, pp. 66–71, 1990. View at Publisher · View at Google Scholar · View at Scopus
  14. Y. S. Maarek, D. M. Berry, and G. E. Kaiser, “An information retrieval approach for automatically constructing software libraries,” IEEE Transactions on Software Engineering, vol. 17, no. 8, pp. 800–813, 1991. View at Publisher · View at Google Scholar · View at Scopus
  15. D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” Journal of Machine Learning Research, vol. 3, no. 4-5, pp. 993–1022, 2003. View at Google Scholar · View at Scopus
  16. X. Sun, X. Liu, B. Li, Y. Duan, H. Yang, and J. Hu, “Exploring topic models in software engineering data analysis: a survey,” in Proceedings of the 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD '16), pp. 357–362, IEEE, Shanghai, China, June 2016. View at Publisher · View at Google Scholar · View at Scopus
  17. B. Gu, V. S. Sheng, K. Y. Tay, W. Romano, and S. Li, “Incremental support vector learning for ordinal regression,” IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 7, pp. 1403–1416, 2015. View at Publisher · View at Google Scholar · View at MathSciNet
  18. B. Gu, V. S. Sheng, Z. Wang, D. Ho, S. Osman, and S. Li, “Incremental learning for ν-support vector regression,” Neural Networks, vol. 67, pp. 140–150, 2015. View at Publisher · View at Google Scholar
  19. J. Tang, Z. Meng, X. Nguyen, Q. Mei, and M. Zhang, “Understanding the limiting factors of topic modeling via posterior contraction analysis,” in Proceedings of the 31th International Conference on Machine Learning, pp. 190–198, 2014.
  20. A. Panichella, B. Dit, R. Oliveto, M. Di Penta, D. Poshynanyk, and A. De Lucia, “How to effectively use topic models for software engineering tasks? An approach based on genetic algorithms,” in Proceedings of the 35th International Conference on Software Engineering (ICSE '13), pp. 522–531, IEEE, May 2013. View at Publisher · View at Google Scholar · View at Scopus
  21. Y. Zhang, X. Sun, and B. Wang, “Efficient algorithm for k-barrier coverage based on integer linear programming,” China Communications, vol. 13, no. 7, pp. 16–23, 2016. View at Publisher · View at Google Scholar
  22. Q. Liu, W. Cai, J. Shen, Z. Fu, X. Liu, and N. Linge, “A speculative approach to spatial-temporal efficiency with multi-objective optimization in a heterogeneous cloud environment,” Security and Communication Networks, vol. 9, no. 17, pp. 4002–4012, 2016. View at Publisher · View at Google Scholar
  23. D. Binkley, D. Heinz, D. Lawrie, and J. Overfelt, “Understanding LDA in source code analysis,” in Proceedings of the 22nd International Conference on Program Comprehension (ICPC '14), pp. 26–36, June 2014. View at Publisher · View at Google Scholar · View at Scopus
  24. T. Mens, A. Serebrenik, and A. Cleve, Eds., Evolving Software Systems, Springer, 2014.
  25. F. Longo, R. Tiella, P. Tonella, and A. Villafiorita, “Measuring the impact of different categories of software evolution,” in Software Process and Product Measurement, International Conferences: IWSM 2008, Metrikon 2008, and Mensura 2008, pp. 344–351, 2008. View at Google Scholar
  26. B. Dit, L. Guerrouj, D. Poshyvanyk, and G. Antoniol, “Can better identifier splitting techniques help feature location?” in Proceedings of the IEEE 19th International Conference on Program Comprehension (ICPC '11), pp. 11–20, IEEE, Ontario, Canada, June 2011. View at Publisher · View at Google Scholar · View at Scopus
  27. T. Fritz, G. C. Murphy, E. Murphy-Hill, J. Ou, and E. Hill, “Degree-of-knowledge: modeling a developer's knowledge of code,” ACM Transactions on Software Engineering and Methodology, vol. 23, no. 2, article 14, 2014. View at Publisher · View at Google Scholar · View at Scopus
  28. X. Sun, X. Liu, J. Hu, and J. Zhu, “Empirical studies on the NLP techniques for source code data preprocessing,” in Proceedings of the 3rd International Workshop on Evidential Assessment of Software Technologies (EAST '14), pp. 32–39, May 2014. View at Publisher · View at Google Scholar · View at Scopus
  29. G. Santos, M. T. Valente, and N. Anquetil, “Remodularization analysis using semantic clustering,” in Proceedings of the Software Evolution Week—IEEE Conference on Software Maintenance, Reengineering and Reverse Engineering (CSMR-WCRE '14), pp. 224–233, Antwerp, Belgium, February 2014. View at Publisher · View at Google Scholar
  30. T. Hofmann, “Unsupervised learning by probabilistic latent semantic analysis,” Machine Learning, vol. 42, no. 1-2, pp. 177–196, 2001. View at Publisher · View at Google Scholar · View at Scopus
  31. T. Hofmann, “Probabilistic latent semantic analysis,” in Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI '99), pp. 289–296, Stockholm, Sweden, July 1999, https://dslpitt.org/uai/displayArticleDetails.jsp?mmnu=1&smnu=2&proceeding_id=15&article_id=179.
  32. Y. Liu, D. Poshyvanyk, R. Ferenc, T. Gyimóthy, and N. Chrisochoides, “Modeling class cohesion as mixtures of latent topics,” in Proceedings of the IEEE International Conference on Software Maintenance (ICSM '09), pp. 233–242, Alberta, Canada, September 2009. View at Publisher · View at Google Scholar · View at Scopus
  33. M. Shtern and V. Tzerpos, “Clustering methodologies for software engineering,” Advances in Software Engineering, vol. 2012, Article ID 792024, 18 pages, 2012. View at Publisher · View at Google Scholar
  34. M. P. Robillard and G. C. Murphy, “Representing concerns in source code,” ACM Transactions on Software Engineering and Methodology, vol. 16, no. 1, article 3, 2007. View at Publisher · View at Google Scholar · View at Scopus
  35. D. Binkley, M. Ceccato, M. Harman, F. Ricca, and P. Tonella, “Tool-supported refactoring of existing object-oriented code into aspects,” IEEE Transactions on Software Engineering, vol. 32, no. 9, pp. 698–717, 2006. View at Publisher · View at Google Scholar · View at Scopus
  36. S. Deerwester, “Improving information retrieval with latent semantic indexing,” in Proceedings of the Annual Meeting of the American Society for Information Science, pp. 1–10, 1988.
  37. D. Poshyvanyk, M. Gethers, and A. Marcus, “Concept location using formal concept analysis and information retrieval,” ACM Transactions on Software Engineering and Methodology, vol. 21, no. 4, pp. 1–34, 2012. View at Publisher · View at Google Scholar
  38. J. I. Maletic and A. Marcus, “Supporting program comprehension using semantic and structural information,” in Proceedings of the 23rd International Conference on Software Engineering, pp. 103–112, May 2001. View at Scopus
  39. J. Han, Data Mining: Concepts and Techniques, Morgan Kaufmann, San Francisco, Calif, USA, 2005.
  40. X. Sun, B. Li, Y. Li, and Y. Chen, “What information in software historical repositories do we need to support software maintenance tasks? An approach based on topic model,” in Computer and Information Science, pp. 27–37, Springer International Publishing, 2015. View at Google Scholar
  41. C. J. van Rijsbergen, Information Retrieval, Butterworths, London, UK, 1979.
  42. U. Erdemir, U. Tekin, and F. Buzluca, “Object oriented software clustering based on community structure,” in Proceedings of the 18th Asia Pacific Software Engineering Conference (APSEC '11), pp. 315–321, IEEE, Ho Chi Minh, Vietnam, December 2011. View at Publisher · View at Google Scholar · View at Scopus
  43. A. De Lucia, M. Di Penta, R. Oliveto, A. Panichella, and S. Panichella, “Using IR methods for labeling source code artifacts: is it worthwhile?” in Proceedings of the 20th IEEE International Conference on Program Comprehension (ICPC '12), pp. 193–202, June 2012. View at Scopus
  44. X. Liu, X. Sun, B. Li, and J. Zhu, “PFN: a novel program feature network for program comprehension,” in Proceedings of the 13th IEEE/ACIS International Conference on Computer and Information Science (ICIS '14), pp. 349–354, Taiyuan, China, June 2014. View at Publisher · View at Google Scholar · View at Scopus
  45. Z. Fu, X. Wu, C. Guan, X. Sun, and K. Ren, “Toward efficient multi-keyword fuzzy search over encrypted outsourced data with accuracy improvement,” IEEE Transactions on Information Forensics and Security, vol. 11, no. 12, pp. 2706–2716, 2016. View at Publisher · View at Google Scholar
  46. B. S. Mitchell and S. Mancoridis, “On the automatic modularization of software systems using the bunch tool,” IEEE Transactions on Software Engineering, vol. 32, no. 3, pp. 193–208, 2006. View at Publisher · View at Google Scholar · View at Scopus
  47. S. Islam, J. Krinke, D. Binkley, and M. Harman, “Coherent clusters in source code,” Journal of Systems and Software, vol. 88, no. 1, pp. 1–24, 2014. View at Publisher · View at Google Scholar · View at Scopus
  48. S. Mirarab, A. Hassouna, and L. Tahvildari, “Using Bayesian belief networks to predict change propagation in software systems,” in Proceedings of the 15th IEEE International Conference on Program Comprehension (ICPC '07), pp. 177–186, June 2007. View at Publisher · View at Google Scholar · View at Scopus
  49. F. Deng and J. A. Jones, “Weighted system dependence graph,” in Proceedings of the 5th IEEE International Conference on Software Testing, Verification and Validation (ICST '12), pp. 380–389, Montreal, Canada, April 2012. View at Publisher · View at Google Scholar · View at Scopus
  50. M. Gethers, A. Aryani, and D. Poshyvanyk, “Combining conceptual and domain-based couplings to detect database and code dependencies,” in Proceedings of the IEEE 12th International Working Conference on Source Code Analysis and Manipulation (SCAM '12), pp. 144–153, IEEE, Trento, Italy, September 2012. View at Publisher · View at Google Scholar · View at Scopus
  51. Z. Fu, X. Sun, Q. Liu, L. Zhou, and J. Shu, “Achieving efficient cloud search services: multi-keyword ranked search over encrypted cloud data supporting parallel computing,” IEICE Transactions on Communications, vol. E98B, no. 1, pp. 190–200, 2015. View at Publisher · View at Google Scholar · View at Scopus
  52. Z. Xia, X. Wang, X. Sun, and Q. Wang, “A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data,” IEEE Transactions on Parallel and Distributed Systems, vol. 27, no. 2, pp. 340–352, 2016. View at Publisher · View at Google Scholar
  53. L. Guerrouj, “Normalizing source code vocabulary to support program comprehension and software quality,” in Proceedings of the 35th International Conference on Software Engineering (ICSE '13), pp. 1385–1388, San Francisco, Calif, USA, May 2013. View at Publisher · View at Google Scholar · View at Scopus
  54. A. De Lucia, M. Di Penta, and R. Oliveto, “Improving source code lexicon via traceability and information retrieval,” IEEE Transactions on Software Engineering, vol. 37, no. 2, pp. 205–227, 2011. View at Publisher · View at Google Scholar · View at Scopus
  55. N. Anquetil and T. C. Lethbridge, “Recovering software architecture from the names of source files,” Journal of Software Maintenance and Evolution, vol. 11, no. 3, pp. 201–221, 1999. View at Publisher · View at Google Scholar · View at Scopus
  56. K. Sartipi and K. Kontogiannis, “A user-assisted approach to component clustering,” Journal of Software Maintenance and Evolution, vol. 15, no. 4, pp. 265–295, 2003. View at Publisher · View at Google Scholar · View at Scopus
  57. T. Ma, J. Zhou, M. Tang et al., “Social network and tag sources based augmenting collaborative recommender system,” IEICE Transactions on Information and Systems, vol. E98-D, no. 4, pp. 902–910, 2015. View at Publisher · View at Google Scholar · View at Scopus
  58. G. Santos, M. T. Valente, and N. Anquetil, “Remodularization analysis using semantic clustering,” in Proceedings of the 1st Software Evolution Week—IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE '14), pp. 224–233, February 2014. View at Publisher · View at Google Scholar · View at Scopus
  59. Z. Xia, X. Wang, X. Sun, and B. Wang, “Steganalysis of least significant bit matching using multi-order differences,” Security and Communication Networks, vol. 7, no. 8, pp. 1283–1291, 2014. View at Publisher · View at Google Scholar · View at Scopus
  60. S. Kawaguchi, P. K. Garg, M. Matsushita, and K. Inoue, “Mudablue: an automatic categorization system for open source repositories,” in Proceedings of the 11th Asia-Pacific Software Engineering Conference (APSEC '04), pp. 184–193, Busan, Republic of Korea, December 2004.
  61. A. Kuhn, S. Ducasse, and T. Gîrba, “Enriching reverse engineering with semantic clustering,” in Proceedings of the 12th Working Conference on Reverse Engineering (WCRE '05), pp. 133–142, Pittsburgh, Pa, USA, November 2005. View at Publisher · View at Google Scholar · View at Scopus
  62. A. Corazza, S. Di Martino, V. Maggio, and G. Scanniello, “Investigating the use of lexical information for software system clustering,” in Proceedings of the 15th European Conference on Software Maintenance and Reengineering (CSMR '11), pp. 35–44, IEEE, Oldenburg, Germany, March 2011. View at Publisher · View at Google Scholar · View at Scopus
  63. G. Scanniello, M. Risi, and G. Tortora, “Architecture recovery using Latent Semantic Indexing and k-Means: an empirical evaluation,” in Proceedings of the 8th IEEE International Conference on Software Engineering and Formal Methods (SEFM '10), pp. 103–112, September 2010. View at Publisher · View at Google Scholar · View at Scopus
  64. G. Scanniello, A. D'Amico, C. D'Amico, and T. D'Amico, “Using the Kleinberg algorithm and vector space model for software system clustering,” in Proceedings of the 18th IEEE International Conference on Program Comprehension (ICPC '10), pp. 180–189, IEEE, Braga, Portugal, June-July 2010. View at Publisher · View at Google Scholar · View at Scopus
  65. G. Scanniello and A. Marcus, “Clustering support for static concept location in source code,” in Proceedings of the IEEE 19th International Conference on Program Comprehension (ICPC '11), pp. 36–40, Kingston, Canada, June 2011. View at Publisher · View at Google Scholar · View at Scopus
  66. Y. Kong, M. Zhang, and D. Ye, “A belief propagation-based method for task allocation in open and dynamic cloud environments,” Knowledge-Based Systems, vol. 115, pp. 123–132, 2017. View at Publisher · View at Google Scholar
  67. A. M. Saeidi, J. Hage, R. Khadka, and S. Jansen, “A search-based approach to multi-view clustering of software systems,” in Proceedings of the 22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER '15), pp. 429–438, March 2015. View at Publisher · View at Google Scholar · View at Scopus