Table of Contents Author Guidelines Submit a Manuscript
Scientific Programming
Volume 2017, Article ID 3787053, 15 pages
Research Article

Clustering Classes in Packages for Program Comprehension

1School of Information Engineering, Yangzhou University, Yangzhou, China
2State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
3School of Computer Science and Engineering, Southeast University, Nanjing, China
4School of Information Systems, Singapore Management University, Singapore
5Nanjing University of Information Science & Technology, Nanjing, China

Correspondence should be addressed to Bin Li; nc.ude.uzy@bl

Received 16 October 2016; Revised 13 February 2017; Accepted 27 February 2017; Published 11 April 2017

Academic Editor: Xuanhua Shi

Copyright © 2017 Xiaobing Sun et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


During software maintenance and evolution, one of the important tasks faced by developers is to understand a system quickly and accurately. With the increasing size and complexity of an evolving system, program comprehension becomes an increasingly difficult activity. Given a target system for comprehension, developers may first focus on the package comprehension. The packages in the system are of different sizes. For small-sized packages in the system, developers can easily comprehend them. However, for large-sized packages, they are difficult to understand. In this article, we focus on understanding these large-sized packages and propose a novel program comprehension approach for large-sized packages, which utilizes the Latent Dirichlet Allocation (LDA) model to cluster large-sized packages. Thus, these large-sized packages are separated as small-sized clusters, which are easier for developers to comprehend. Empirical studies on four real-world software projects demonstrate the effectiveness of our approach. The results show that the effectiveness of our approach is better than Latent Semantic Indexing- (LSI-) and Probabilistic Latent Semantic Analysis- (PLSA-) based clustering approaches. In addition, we find that the topic that labels each cluster is useful for program comprehension.