Table of Contents Author Guidelines Submit a Manuscript
Scientific Programming
Volume 2017, Article ID 4382348, 11 pages
Research Article

Using Hierarchical Latent Dirichlet Allocation to Construct Feature Tree for Program Comprehension

1School of Information Engineering, Yangzhou University, Yangzhou, China
2Tongda College of Nanjing University of Posts and Telecommunications, Nanjing, China
3Hainan University, Haikou, China

Correspondence should be addressed to Bin Li; nc.ude.uzy@bl

Received 9 November 2016; Revised 10 February 2017; Accepted 26 March 2017; Published 12 April 2017

Academic Editor: Michele Risi

Copyright © 2017 Xiaobing Sun et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Program comprehension is an important task faced by developers during software maintenance. With the increasing complexity of evolving systems, program comprehension becomes more and more difficult. In practice, programmers are accustomed to getting a general view of the features in a software system and then finding the interesting or necessary files to start the understanding process. Given a system, developers may need a general view of the system. The traditional view of a system is shown in a package-class structure which is difficult to understand, especially for large systems. In this article, we focus on understanding the system in both feature view and file structure view. This article proposes an approach to generate a feature tree based on hierarchical Latent Dirichlet Allocation (hLDA), which includes two hierarchies, the feature hierarchy and file structure hierarchy. The feature hierarchy shows the features from abstract level to detailed level, while the file structure hierarchy shows the classes from whole to part. Empirical results show that the feature tree can produce a view for the features and files, and the clustering of classes in the package in our approach is better (in terms of recall) than the other clustering approach, that is, hierarchical clustering.