Complexity

Complexity / 2020 / Article
Special Issue

Solving Engineering and Science Problems Using Complex Bio-inspired Computation Approaches

View this Special Issue

Research Article | Open Access

Volume 2020 |Article ID 1794947 | https://doi.org/10.1155/2020/1794947

Masoud Aghdasifam, Habib Izadkhah, Ayaz Isazadeh, "A New Metaheuristic-Based Hierarchical Clustering Algorithm for Software Modularization", Complexity, vol. 2020, Article ID 1794947, 25 pages, 2020. https://doi.org/10.1155/2020/1794947

A New Metaheuristic-Based Hierarchical Clustering Algorithm for Software Modularization

Guest Editor: Narayanan Kumarappan
Received07 May 2020
Revised10 Jul 2020
Accepted20 Jul 2020
Published30 Sep 2020

Abstract

Software refactoring is a software maintenance action to improve the software internal quality without changing its external behavior. During the maintenance process, structural refactoring is performed by remodularizing the source code. Software clustering is a modularization technique to remodularize artifacts of source code aiming to improve readability and reusability. Due to the NP hardness of the clustering problem, evolutionary approaches such as the genetic algorithm have been used to solve this problem. In the structural refactoring literature, there exists no search-based algorithm that employs a hierarchical approach for modularization. Utilizing global and local search strategies, in this paper, a new search-based top-down hierarchical clustering approach, named TDHC, is proposed that can be used to modularize the system. The output of the algorithm is a tree in which each node is an artifact composed of all artifacts in its subtrees and is a candidate to be a software module (i.e., cluster). This tree helps a software maintainer to have better vision on source code structure to decide appropriate composition points of artifacts aiming to create modules (i.e., files, packages, and components). Experimental results on seven folders of Mozilla Firefox with different functionalities and five other software systems show that the TDHC produces modularization closer to the human expert’s decomposition (i.e., directory structure) than the other existing algorithms. The proposed algorithm is expected to help a software maintainer for better remodularization of a source code. The source codes and dataset related to this paper can be accessed at https://github.com/SoftwareMaintenanceLab.

1. Introduction

Software maintenance is the process of modifying a software product after releasing it to reduce faults, improve performance, or improve the design. Software maintenance tasks are important for future software development and consume approximately 90 percent of the total cost [1].

In software maintenance, some changes including add, delete, or modify code lead to the growth of code blocks and difficulty in code’s understandability in the future. Code smells (or bad code smells) are part of source code which do not cause faults on external behavior and do not have a significant problem in internal behavior at this moment but may cause issues in the future development process [2]. Software refactoring is modifying the source code to rectify code smells without any change in the external behavior of the system. It improves the quality of software source code by reducing the potential occurrence of bugs and keeping the code easier to maintain or extend in the future.

Fowler et al. reported some possible code smells in their book [3] for object-oriented programming-based systems and proposed possible refactoring scenarios for them. Since then, many studies have been done to propose new refactoring scenarios or validating effects of applying various scenarios in the source code to achieve better quality.

Refactoring techniques are classified into two major conceptual and structural groups. For example, rename method refactoring is a conceptual refactoring scenario that changes the name of a method for a better explanation of its responsibility. Some structural refactoring scenarios are about methods or functions composing. For example, long code blocks usually have multiple responsibilities or duplicate blocks that should be refactored. Some other structural refactoring scenarios are to improve the functionality of code blocks. As an example, move method refactoring (MMR) is a refactoring scenario that is defined as the act of moving a method from one class to another class which has the most relation with that method. The relation between methods can be structural relations like calls or semantic relations. There are also some composite refactorings that are defined as a sequence of primitive refactorings that reflect complex transformations.

To illustrate a structural refactoring task, Figure 1 depicts an example modularization for a small software system. In this figure, each node is a class and edges represent a collaboration between the classes. These classes are separated into two modules according to their collaborations. Figure 2 shows several changes on this software after some maintenance actions. As shown, the relations between nodes are changed and also a new class “I” is added to the system. In Figure 2, relations of the node “G” with the nodes in the left module are more than relations in the right module. So it is necessary to relocate the position of this node (and node “I”) by a remodularization. The result of remodularization is shown in Figure 3.

Manually analyzing the source code to refactoring is a costly and time-consuming process. Hence, many researches have been done about automatic refactoring. One approach for structural refactoring is remodularization, as shown in Figure 2, so that the remodularization is performed by clustering techniques. According to [4], “The aim of the software clustering process is to partition a software system into modules (subsystems or packages), where a module is made up of a set of software artifacts which collaborate with each other to implement a high-level attribute or provide a high-level service for the rest of the software system.” The input of a clustering algorithm is artifact dependency graph (ADG), where the nodes of this graph indicate artifacts and the edges show the relationships between artifacts. An artifact can be an entity such as a function, a file, a software class, or even a collection of classes so-called package or files in a source code folder. The relation between artifacts can be created from structured features like calls or nonstructured features like semantic relations. Figure 4 shows an example of clustering in which artifacts of a small compiler are partitioned into four modules (clusters) according to their relations. These modules are expected to have maximum cohesion and minimum coupling with other modules [6, 7].

Current clustering strategies for obtaining proper modularization are based on two major hierarchical or nonhierarchical techniques. In hierarchical methods, a tree of relations is constructed from the artifacts at the leaf to the root. These techniques give developers a hierarchical view for decision-making about the number and appropriate cutpoint in a tree to construct modules. Most presented hierarchical methods for software clustering are agglomerative (bottom-up). In such algorithms, each artifacts starts in its own cluster; based on certain criterion, e.g., Jaccard, the proximity is calculated between all clusters, and pairs of clusters with the highest proximity are merged as one moves up the hierarchy [8]. The main limitations of hierarchical algorithms are as follows [8]:(1)Due to the presence of zigzag, to identify modules, it is necessary to make the whole tree to the end.(2)There exists no well-defined criterion to decide where the clustering process should stop.(3)Arbitrary decisions are one of the main problems in hierarchical clustering methods. These decisions have a magnificent impact on the final clustering. When faced with arbitrary decisions and a wrong choice, there is no possibility of reversing and correcting wrong choices.(4)These algorithms are greedy and hence cannot explore the problem space well. Several previous studies [911] have shown these methods do not perform well in software clustering. On the contrary, there is no hierarchical clustering algorithm that proposes cut points from different levels of the dendrogram.

There are also nonhierarchical modularization methods based on search-based approaches which explore solution space by global search or local search algorithms. But these methods do not give the developer a vision about upper-level relationships between modules.

In the literature, because of the NP hardness of clustering problem, search-based methods (such as genetic algorithm) have been widely used [8, 12]. Because of their exploration and exploitation ability, they are an effective way to solve the clustering problem [13]. Currently, search-based works on software refactoring with remodularization approaches are in the flat mode (i.e., nonhierarchical methods) and do not offer appropriate composing at higher levels.

1.1. The Problem

In this paper, we focus on a specific restructuring problem in the context of object-oriented and procedural programs: given an ADG constructed from an existing code, decompose it into smaller and meaningful modules that have a higher cohesion and lower coupling. Cohesion is defined as “the degree to which the internal contents of a module are related” [1]. Our method supports “big-bang” remodularization; i.e., all the artifacts of the software system are considered for remodularization.

The main problem addressed in this paper is to suggest a possible hierarchical remodularization for a source code, while keeping that accurate in terms of proximity to (human) expert decomposition. In this paper, a hierarchical top-down clustering algorithm is proposed to structurally refactor the source code from its artifact dependency graph (ADG) with a branch and bound approach. The aim is to find the appropriate composition tree and recommend the lowest appropriate levels to merge artifacts as a module. It, therefore, will be easier for the developer to recognize the position of the different levels, such as files, packages, or components. In the proposed method, a genetic algorithm (GA) along with a neighboring search algorithm is designed to search in trees of the composition of artifacts. The proposed algorithm is evaluated on seven folders of Mozilla Firefox and five other open-source systems. The results indicate that the method is able to propose an acceptable refactoring by hierarchical remodularization of artifacts, by giving a vision about high-level relation between modules for developers.

1.2. Contribution

The contributions of this paper are summarized as follows:(1)Proposing a new software refactoring method with a top-down hierarchical modularization technique. The output of the algorithm is a tree generated from source code which helps software maintainer to have better vision on source code structure to decide appropriate composition points of artifacts aiming to create modules (i.e., files, packages, and components). It is important to note that, in the literature, there exists no search-based algorithm that employs a hierarchical approach for modularization.(2)Prufer sequence is utilized in GA for encoding tree. Existing encoding methods used in software modularization are real-based (e.g., BUNCH [5], ECA [12], and SGA [14]) or permutation-based (e.g., DAGC [15] and E-CDGM [16]), in which these methods show only a flat modularization.(3)A new objective function is proposed to evaluate hierarchical remodularization.

The rest of paper is as follows: in Section 2, some research studies on software refactoring are discussed; Section 3 introduces the proposed algorithm, and in Section 4, experimental results are presented. The result of research and threats to validity are discussed in Sections 4 and 5, respectively. Finally, Section 6 is conclusions of this research and future work.

After publishing Fowler’s book [3] on software source code refactoring, many studies have been done to refine the concepts of this reference, as well as an automated solution for detecting and repairing code smells, e.g., [1720].

Remodularization of source code artifacts is an approach for structural refactoring. Due to the large space of the solution space for modularization, many search-based research studies have been done. In Bunch algorithm [5, 7, 21], a GA, namely, Bunch-GA, and two hill-climbing algorithms, namely, Bunch-NAHC and Bunch-SAHC, are utilized to search in solution space. In this algorithm, the space size of solutions is ( is the number of artifacts), in which most of them represent the same modularization. Parsa and Bushehiran introduced DAGC coding [15] to solve this problem, which reduces the space of states to . Tajgardan et al. [22] presented an algorithm based on estimation of distribution algorithm (EDA) which does not have the challenge of specifying the parameters of GA algorithms. Izadkhah et al. [16] presented E-CDGM method that at first converts the source code to an intermediate code called mCode from call dependency graph (CDG) and then proposes a modularization with a fitness function (using class-property, class-method, and method-method relations) and self-automata algorithm and DAGC encoding. Amarjeet et al. [23] presented the MaABC algorithm for software modularization which is a multiobjective optimization method using the bee population algorithm. They also presented PSOMC [24], a PSO-based module clustering, which partitions software system by optimizing intracluster dependency, intercluster dependency, number of clusters, and number of modules per cluster.

Recent research on multiobjective search methods has expanded. Praditwong et al. [12] presented two equal-size cluster approaches (ECA) and the maximizing cluster approach (MCA) for software modularization using a multiobjective genetic algorithm and Pareto optimality. Harman and Tratt [25] also had used Pareto optimality to combine two metrics: CBO [26] and a new metric called SDMPC. Seng et al. [27] proposed a GA-based approach to suggest refactorings by a fitness function composing of coupling, cohesion, complexity, and stability. Kebir et al. [28] presented a genetic algorithm-based approach, which consists of detecting component-relevant code smells and eliminating these code smells by searching for the best sequence of refactorings using a genetic algorithm. In [29], Kumari and Srinivas proposed MHypEA (multiobjective hyperheuristic evolutionary algorithm) to suggest software module clusters while maximizing cohesion and minimizing coupling of the software modules. It is based on different methods of selection, crossover, and mutation operations of evolutionary algorithms, and the selection mechanism to select a low-level heuristic is based on reinforcement learning with adaptive weights.

In [30], Huang and Liu introduced a new objective function called MS to automatically guide optimization algorithms to find a good partition of software systems which consider both global modules and edge directions. Then, three modularization algorithms named HC-SMCP, GA-SMCP, and MAEA-SMCP are proposed in this paper which are adopted to optimize MS for software systems.

Bavota et al. have some researches on refactoring. In [31], a new technique is proposed for automatic remodularization of packages, which use structural and semantic measures to decompose a package into smaller, more cohesive ones. The results showed that the decomposed packages have better cohesion without deterioration of coupling, and the remodularization proposed by the tool is also meaningful from a functional point of view. In [32], they introduced a tool called R3 that automatically analyzed the underlying latent topics inferred from identifiers, comments, and string literals in the source code classes as well as structural dependencies among these classes. They presented [33] a method for extract class refactoring based on three SSM [34], CDM [35], and CSM [36] structural and semantic factors that strongly increase the cohesion of the refactored classes without leading to significant increase in terms of coupling. In [37], they proposed a technique based on relational topic models to identify MMR opportunities.

Maletic and Marcus [38] proposed an algorithm which uses semantic and structural data to propose refactoring decisions. In [39], Palomba et al. presented a technique, called TACO (textual analysis for code smell detection), that exploits textual analysis to detect a family of smells of different natures and different levels of granularity.

Jalali et al. [8] proposed a new multiobjective fitness function for modularization, named MOF, which uses the structural and nonstructural features with EoD algorithm. In [40], a new deterministic clustering algorithm named neighborhood tree algorithm is presented which creates a neighborhood tree using available knowledge in an ADG. Mahouachi [41] proposed a method which used NSGA-II [42] to find the best sequence of refactorings that maximize structural quality, maximize semantic cohesiveness of packages, and minimize the refactoring effort that is able to produce a coherent and useful sequence of recommended refactorings both in terms of quality metrics and from the developer’s points of view. Ouni et al. [43] proposed a new refactoring recommendation, called MORE, to improve design quality and fix code smells using NSGA-III [42]. Dallal [44] introduced a measure to precisely predict whether a class includes methods in need of MMR. Me et al. [45] presented a new mathematical programming model for the software remodularization problem with a novel metric based on the principle of complexity balance and a hybrid genetic algorithm (HGA).

Kargar et. al have some research studies on the remodularization of multiprogramming language software systems. In [14], they have presented two dependency graphs called semantic dependency graph (SDG) and nominal similarity graph (NSG). Both of these graphs are constructed independently of programming languages syntax. The SDG is constructed based on all nouns of the source code, and the NSG is constructed based on the similarity between artifact names. Then, in [46], they proposed a genetic algorithm to modularize programs by combining the constructed dependency graphs (i.e., call dependency graph, semantic dependency graph, and nominal similarity graph).

In summary, search-based algorithms are described in three aspects. One aspect is the scope of the search (local strategy and global strategy). Some algorithms are based on local search strategy, and the result may not be the optimal solution. Global search techniques always aim to find good solutions. Single objective or multiobjective is another grouping for search algorithms. In multiobjective algorithms, there are multiple functions or metrics aiming to guide the search process. The last aspect is to use semantic features vs structured features for clustering. In semantic search optimizations, lexical analysis or latent semantic analysis (or both) is considered in search progress. In structural features, the function call between two artifacts, inheritance, etc. is considered for clustering. Some search-based clustering algorithms are shown in Table 1.


NameType of algorithmType of objective functionStructural/semantic features

Bunch-NAHC and Bunch-SAHC [7, 21]LSSOS
Multiple hill-climbing approach [47]LSSOS
E-CDGM [16]LSSOS
Large neighborhood search [48]LSSOS
HC-SMCP [30]LSSOS
SHC [49]LSSOSe
Bunch-GA [5]GSSOS
DAGC [15]GSSOS
A multiagent evolutionary algorithm [50]GSSOS
Harmony search [51]GSMOS
GA-SMCP [30]GSMOS
Hyperheuristic approach [52]GSMOS
ECA and MCA [12]GSMOS
Estimation of distribution approach [22]GSSOS
EoD, CGH, CGoH [8]GSMOBoth S and Se
Search-based multiobjective software remodularization [53]GSMOS
Multiple relationship factors [54]GSMOS
Interactive evolutionary optimization [55]GSMOS
GAKH [56]GSSOS
MaABC [57]GSMOS
HGA [45]GSSOS
ILOF [58]GSSupport SO and MOBoth S and Se

LS : local search; GS : global search; SO : single objective; MO : multiobjective; S : structural; Se : semantic.

In the hierarchical methods, all the artifacts are initially considered as units of modularization, and during a repetitive process, the more similar modules are merged to create a new module. Single-linkage, complete-linkage, and average-linkage algorithms are most common hierarchical clustering algorithms which Maqbool et al. adapted to modularize source codes [59]. Kuhn et al. proposed a new algorithm using the average linkage that used nonstructural features for modularization [60]. The authors of this paper have used program code property attributes and variables’ naming for communication recognition, which makes the output of the algorithm dependent on the level of knowledge of developers in inserting descriptions and naming variables. Andritsos and Tzerpos introduced a method called LIMBO [61] as a hierarchical algorithm combining structural and nonstructural information. This algorithm is a hierarchical sampling algorithm based on minimizing the loss of information during the modularization of a software system. Rathee et al. [62] proposed a new hierarchical technique of software remodularization by estimating conceptual similarity among software artifacts that uses both structural and semantic coupling measurements together to get much more accurate coupling measures. They also presented a new weighted dependency measurement scheme in which combined structural, conceptual, and change history-based relations are among software elements together.

In addition to the search-based and hierarchical methods discussed above, there are a number of graph-based and pattern-based methods. Mohammadi and Izadkhah in [40] use a neighboring tree generated from the ADG to cluster a software system. The clustering quality obtained by this algorithm is better than hierarchical methods and less than evolutionary methods. Spectral methods [63] use algebraic properties of the graph, such as eigenvalues and eigenvectors in the corresponding Laplacian matrix to perform clustering. Algorithm for comprehension-driven clustering (ACDC) [64] is a pattern-based algorithm that was introduced by Tzerpos and Holt. It uses several patterns to cluster code artifacts.

2.1. Gaps in the Literature

Using hierarchical property is not practically new and has been used for many years in the remodularization field, but there is no previous research using the hierarchical property with an evolutionary approach for remodularization. Due to the NP hardness of the modularization problem, most modularization methods utilize search-based clustering methods and evolutionary algorithms [8, 12]. These clustering algorithms show only a flat modularization of a program. Therefore, these algorithms cannot represent the hierarchy properties of a program, so there is no way to specify the encapsulation levels, e.g., module, package, and component, in it by the designer.

3. The Proposed Clustering Algorithm

Most of the work on remodularization is based on clustering techniques [31]. Hierarchical clustering algorithms proposed up to now are greedy algorithms and have arbitrary decisions that may lead to undesired results. On the contrary, these algorithms do not recommend an appropriate cutpoint in the dendrogram or modularization point from different levels of it. In this section, a new clustering algorithm with a hierarchical approach is proposed for source code remodularization which does not have these problems. To this end, we design a genetic algorithm with a new encoding and fitness function. The encoding presented is utilized to construct a tree from source code’s artifacts and the fitness function with a branch and bound approach is applied to determine appropriate levels in the constructed tree, which result can be a qualified modularization. To improve the quality of the resulting modularization, we also designed a hill-climbing algorithm. This local search algorithm will be applied on the outcome of the genetic algorithm for a neighboring search. The algorithm’s input is an ADG constructed from source code, and its output is a modularization suggested for software maintainer. Our method supports “big-bang” remodularization; i.e., all the artifacts of the software system are considered to perform modularization, and the current structure (modularization) will not be considered.

We consider classes and files are the smallest composing unit as an artifact to perform modularization in object-oriented and structured software systems, respectively. These parts are combined in larger modules such as packages or components in which members of each module are contributing to other parts of that module for a single responsibility. Hence, it is important to have proper upper-level compositions. We, also, consider call dependency to create a dependency between two artifacts, i.e., edges, in the ADG. Some artifacts that just are called by other artifacts are utility classes or files. So, they can be removed at the beginning and address them after completion of the algorithm. For each one, if all calls are from one module, this artifact will also be added to that module. But if it was used by multiple modules, it is considered as a utility.

To design a genetic-based algorithm, five features encoding (chromosomal representation), fitness function (evaluation), selection, crossover, and mutation must be described.

3.1. Encoding

A chromosome in GA is a parameter collection that represents a solution to the problem. The aim of GA is to find a chromosome with an optimal or near-optimal solution. These parameters can be a binary string or any other data structure. In this paper, the Prufer sequence [65] is employed to encode the tree to a sequence of numbers as a chromosome. Prufer sequence is a one-to-one mapping between a sequence of numbers and a labeled tree. The steps of constructing Prufer numbers for a tree are shown as Algorithm 1. Let denote Prufer sequence. The corresponding tree of a Prufer sequence is constructed as Algorithm 2.

Input: a labeled Tree as T(V, E)
Output: a Prufer sequence
Comment: the nodes of T are labeled from 1 to size of V which represents a node with label i.
PruferSequence ← Empty List
while V.size is greater than 2 do
 ← the leaf in V with the smallest number
j ← label of node connected to.
 Remove from V and (, ) from E
 Add j to PruferSequence
end while
return PruferSequence
Input: a Prufer sequence as P
Output: a labeled tree
degrees ← a list with size P.size + 2 filled by 1
for i ∈ P do
 degrees[i] ← degrees[i] + 1
end for
# Now we know that node i has degree degrees[i] in the tree
i ← 1
T ← an empty list for tree edges
while at least three items in degrees are nonzero do
 ← first smallest item in degrees
 degrees[i] ← degrees[i] − 1
 Add (ai, ) to T
i ← i + 1
end while
a, b ← nonzero items in degrees
Add (a, b) to T
return T
 return PruferSequence

For example, the Prufer sequence for the tree in Figure 5 is and vice versa. To encode the tree to a Prufer sequence, the node with label 4 (as a leaf node with the smallest number) is removed and number 2 is added to the sequence. Then, the node labeled 2 is removed and number 1 is added to the sequence. In the two next steps, the nodes 5 and 6 are removed and number 3 is added to the sequence twice. In the final step, node 3 is removed and number 1 is added to the sequence as the last number of the sequence.

In the proposed method, trees are binary tree, and Prufer sequences follow the following rules:(1)The trees always have leaves numbered from 1 to for artifacts and inner nodes numbered from to .(2)All the artifacts are in the leaves of the tree in which degree is one. Hence, numbers 1 to do not appear in the corresponding Prufer sequence.(3)The root of the tree (node number ) is in degree 2, and according to rules of creating the Prufer sequence, it appears only once in the sequence.(4)All inner nodes except root are in degree 3 (attached to their parent nodes and have two children nodes) and appear twice in the sequence.

Hence, each sequence of numbers to which contains to two times and number has appeared once and represents a hierarchical modularization tree in this algorithm. Figure 6 shows corresponding hierarchical modularization tree for Prufer sequence.

3.2. Evaluation

Each chromosome in the population of a GA should be evaluated to determine the quality of solutions. In the following, we propose a new quality function to evaluate the chromosomes. In the proposed quality function, the fitness of a chromosome is calculated by using the dependencies between modules extracted from the corresponding tree of the chromosome. Let , , and represent the number of connections between the artifacts inside the node (module), the number of connections with the artifacts in the sibling node, and the number of connections with other artifacts, respectively. The fitness of node (i.e., a module) is calculated by exCF in the following equation:

This relation aims to increase cohesion in a module and reduce coupling with other modules. But coupling is separated into two types of sibling coupling () and external coupling (). When external relation is more than relations to the sibling node in the tree, this module (regardless of cohesion) is not in proper position and should be scored with a negative value. When is greater than , connections with artifacts in the sibling node is less than connections with the other artifacts and this shows that the artifact is not in appropriate position and we should give a penalty to total score by assigning -1 this node. Algorithm 3 shows the pseudocode of the evaluation part of this customized genetic algorithm. To evaluate the tree and propose modularization according to the structure of it, the tree is traversed by the breadth-first search (BFS) algorithm from the root. In traversing, if the sum of exCF for two child nodes is greater than or equal to the exCF of that node, they will be added to the process queue. If not, this node is the lowest appropriate position to compose artifacts in leaves of that subtree, as a module. When a node is partitioned into two child nodes, if for one of the child nodes, that node cannot be part of the tree because its outer relation is more than inner relations with its sibling node. In this case, exCF is equal to -1, and child nodes will not be added to the BFS process queue. The total fitness of the tree, , is calculated by (2), where K is the set of all nodes in which their children (if exists) did not proceed:

Input: a chromosome as ch
Output: update ch.fitness
ch.fitness = 0
tree ← decode the chromosome ch to tree
q ← Queue()
tree.root.cf = 1
q.push(tree.root)
while q is not empty do
 parent ← q.pop()
 child1, child2 ← t.children()
 child1.cf ← exCF(child1, parent)
 child2.cf ← exCF(child2, parent)
if child1.cf + child2.cf ≥ parent.cf then
  q.push(child1)
  q.push(child2)
else
  ch.fitness ← ch.fitness + parent.cf
 end if
end while

Figure 7 shows an example of a tree evaluating in this algorithm. This tree has 55 nodes (28 nodes for artifacts in the leaves and 27 inner nodes numbered from 28 to 53) in which numbers in parentheses are exCF for each node. When the evaluation starts, the nodes number 37 and 42 are added to process queue because the sum of their exCF (i.e., ) is greater than the exCF of the parent node 55 (i.e., 1). This tree is traversed until the nodes in the set (colored in grey). Each of these nodes contains all artifacts in the leaves of its subtree and is the first position proposed by the algorithm to create modules. Their child nodes did not add to the BFS queue because the sum of exCF of sibling nodes is not greater than or equal to the parents exCF.

The three operations of GA for this algorithm are described as follows:(1)Selection: to select the next generation of the population in GA generations, the classic roulette wheel selection operator is used in the proposed algorithm.(2)Crossover: cycle crossover operation (CX) [66] is selected for this algorithm, which finds a genes cycle between two parents and swaps other genes. If and represents two parents, at first, one random position is selected. If and are different, one of the locations of value in the first parent is selected, and this new position is added to the selected position lists. These selections continue until selecting a position like in which is . When finished, the values of the selected positions in the first parent are a permutation of values in the same positions of the second parent. Finally, the values of all other positions (unselected) swapped between two parents. Figure 8 shows an example of the crossover operation. In this example, the first position is selected randomly and then third and fourth positions are added to the selection list, respectively, to create a cycle. Values 6, 9, and 8 in the first parent are a permutation of 8, 6, and 9 in the second. In the last step, values in the other positions are swapped with the corresponding position in the other chromosome.The output of CX is a permutation of the input. Hence, it does not disrupt the rules mentioned in Encoding section. However the structure of the tree (relationships between nodes) will be changed.(3)Mutation: single swap operation is used for mutation of a chromosome in which the value of two random positions in the sequence is swapped. Figure 9 shows an example of single swap operation on a Prufer sequence. This change creates a new binary tree.

3.3. Neighboring Search

A genetic algorithm is a global search. To improve the resulting modularization quality at the last step of the GA, we design a hill-climbing local search strategy. The designed local search algorithm tries to produce a neighboring modularization for the resulting modularization with better quality. This operation is continued until no better modularization can be found. We used the steepest ascent strategy for searching neighboring modularizations. In this strategy, all neighboring modularizations for a specific modularization are generated, and then among them, the highest quality modularization is selected as the neighbor of the current modularization and replaces it. This operation for the new modularization is continued until no better modularization can be found. How to define a neighborhood is very important in the climbing algorithm. Depending on the type of problem, it is necessary to define the appropriate neighborhood with it.

3.4. Definition: Neighbor of a Modularization

Let M and be two modularizations from an ADG. Modularization is called a neighbor of modularization M if an artifact into module i in modularization M is moved to module j. In fact, two modularizations are called neighbors if they differ only in the position of a node. Let be a dependency graph, where represents artifacts and represents dependency between artifacts. For example, Figure 10 depicts a sample modularization and Figure 11 shows a neighbor modularization for that. The formal definition of this concept is as follows.

Let represents the modules obtained for graph such that . In , let us take a node such that . The neighbor is created such that and , where () is a module with at least one relation to . Now, is better than if exTMQ() exTMQ().

In the following, we compute the time complexity of the algorithm. Let , , and represent the number of artifacts, population size, and the number of generations, respectively. We have the following:(1)To initiate the population, a chromosome with length is generated in which all numbers between 1 and are repeated twice and one . Then, for each chromosome, a shuffle (replacing each genome with a random one) is applied on it to generate a new random chromosome. So, the order of this step is .(2)To evaluate the chromosome, the data are converted to a tree in , and then, the tree is explored in . Hence, the order of evaluation is .(3)Selection step with roulette wheel is in order .(4)The crossover for each pair will be in , and the mutation is a simple swap in order . So, this step for whole population will be in order .

Steps 2–4 will be repeated times. Hence, the total order is . In this paper, is . So, the order is .

In the last step, a NAHC algorithm is applied to search in neighbors for better solution. Each solution will have at most clusters, and each iteration of NAHC will cost . So, for iteration, it will be .

According to the paragraphs above, the total order is , but, in practice, is a small number and the total order can be explained by .

4. Experimental Setup

In this section, we outline in detail the experimental setup we carried out to empirically assess the proposed clustering algorithm.

4.1. Case Study

Mozilla Firefox, a web browser, is a large-scale and open-source application developed by the Mozilla Foundation and its subsidiary Mozilla Corporation. Based on open hub (http://www.openhub.net) report, this application is the most popular project among other open-source applications, and this application has the largest development teams in the world, more than 13000 developers. We select the Mozilla Firefox 3.7, a developer preview version, for the experiments (https://ftp.mozilla.org/pub/). This version is stable and has approximately five million lines of code. Seven folders with different sizes and functionalities are chosen from this software system. Details of these folders are listed in Table 2. Besides, five medium-size open-source software has been chosen whose details are given in Table 3. In all experiments, the file is considered as an artifact.


Folder nameNumber of filesNumber of linksNumber of modulesFolder functionality

ACCESSIBLE1792938Enabling as many people as possible to use web sites, even when those people’s abilities are limited in some way; files for accessibility (i.e., MSAA (Microsoft Active Accessibility) and ATK (accessibility toolkit, used by GTK + 2) support files)

BROWSER45454Contains the front-end code (in XUL, Javascript, XBL, and C++) for the Firefox browser contains the front-end code for the DevTools (scratchpad and style editor); contains images and CSS files to skin the browser for each OS (Linux, Mac, and Windows)

DOM1633245IDL definitions of the interfaces defined by the DOM specifications as the parts of the connection between JavaScript and the implementations of DOM objects implementations of a few of the core “DOM level 0” objects, such as window, window.navigator, and window.location

EXTENSIONS17920613Contains several extensions to Mozilla, which can be enabled at compile-time implementation of the negotiate auth method for HTTP and other protocols; has code for SSPI and GSSAPI; content- and locale-pack switching user interface permissions backend for cookies, images, etc., as well as the user interface to these permissions and other cookie features; support for the datetime protocol; support for the finger protocol; a two-way bridge between the CLR/.NET/Mono/C#/etc. world and XPCOM implementation of W3C’s platform for privacy preferences standard; support for implementing XPCOM components in python; support for accessing SQL databases from XUL applications; support for webservices

GFX3426447Contains interfaces that abstract the capabilities of platform specific graphics toolkits, along with implementations on various platforms these interfaces provide methods for things like drawing images, text, and basic shapes; it also contains basic data structures such as points and rectangles used here and in other parts of Mozilla

INTL5739577Internationalization and localization support; code for “sniffing” the character encoding of web pages code for dealing with complex text layout, related to shaping of south Asian languages code related to determination of locale information from the operating environment code that converts (both ways: encoders and decoders) between UTF-16 and many other character encodings code related to implementation of various algorithms for unicode text, such as case conversion

IPC391594Container for implementations of IPC (interprocess communication)


SystemDescriptionArtifacts type# of artifacts# of edges

Mini-TunisAcademic operating systemC file2028
JUnit 4Unit testing frameworkJava classes2332
Servlet-APIJava server APIJava classes3224
Easy mockDynamic mock object generatorJava classes84118
CalculatorMicrosoft calculatorC++ files210343

The authoritative decomposition (domain expert decomposition or ground-truth structure) is utilized to evaluate the soundness of a remodularization algorithm [67]. The proximity of the remodularization generated by an algorithm to the decomposition given by a domain expert shows the acceptable achievement of the remodularization algorithm [67]. Like in [14, 67], we use the directory structure to prepare an expert decomposition from source code. In this paper, we used Mozilla Firefox and five other software systems, whose authoritative decomposition (i.e., directory structure) is there to assess the proposed algorithm. For example, the “extensions” folder has 179 files that have been assigned by Mozilla Firefox developers to 13 subfolders (package). Using a designed tool, we merged the files in the different folders in a single folder , aiming to consider these 179 as flat. After modularizing the flatted files , the aims to measure how much modularization achieved by the proposed algorithm will be similar to the directory structure implemented by Mozilla Firefox developers. In other words, the proposed algorithm is applied to the in order to reconstruct (or improve) the original structure.

4.2. Research Questions

To evaluate the effectiveness of TDHC, we answer the following research questions:RQ1. Does the proposed clustering approach produce modularization having a better precision, recall, F-measure, MoJo, and MoJoFM compared to existing approaches?RQ2. Is TDHC a stable algorithm?RQ3. By using TDHC, can we give better view of hierarchical modularization?

To answer these research questions, five software systems and the seven folders of Mozilla Firefox are remodularized by the proposed clustering algorithm and some other available clustering algorithms.

4.3. Algorithmic Parameters

The setting of parameters is necessary for search-based algorithms. We obtained the implementations of five of the selected clustering techniques—ACDC (https://wiki.eecs.yorku.ca/project/cluster/protected:acdc), Bunch (https://www.cs.drexel.edu/spiros/bunch/) (SAHC and GA), SGA and SNDGA (https://github.com/Masoud-Kargar-QIAU), and EoD—from their original authors or official web sites. On the contrary, we got the working implementations of DAGC, ECA, and MCA from https://github.com/Masoud-Kargar-QIAU.

Values of crossover and mutation rates affect exploration and exploitation of solution space during the evolutionary process. Adding one extra artifact to the input of this problem will add two genes to chromosomes. Hence, the problem space grows exponentially. So the crossover and mutation rates are set dynamically based on population to cover the solution space better. Crossover rate usually is selected as a number more than 0.7, and the mutation rate is usually very low. In this research, the numbers 0.7 and 0.9 are selected as boundaries to crossover with linear steps. Because mutation steps are with log, it should not increase much. Table 4 shows the parameters setting for TDHC, in which is the number of artifacts after the preprocessing operation. For the TDHC, we followed the algorithmic parameters setting used in [12, 30]. Algorithmic parameters are dependent on the number of artifacts (N).


Population size ()

Maximum generation count
Crossover rate
Mutation rate
Crossover operationCycle crossover
Mutation operationSingle swap

As in [8, 12, 14], to reduce randomness in the results of our experiments, we collect the average and best of 30 independent runs. To perform a fair comparison, the average of runs is used, and to determine the performance of an algorithm, the best value of runs is utilized.

4.4. Assessment of Results

The comparison has been performed by comparing modules in the leaves of solution tree by modules in the source code (which is developed by the expert team) using precision/recall [4] and MoJoFM [68] and F-measure [4] metrics. The precision/recall metric is used to compare the modularization obtained by the proposed algorithm against expert modularization by (3) in which TP (true positive) is the number of comodules that are relevant (appeared in the original modularization) and were retrieved correctly by the algorithm, FP (false positive) is the number of comodules that are irrelevant but were retrieved, and FN (false negative) is the number of comodules that are relevant but were not retrieved. F-Measure is defined as the harmonic mean of the precision and recall (4). A high value for precision/recall and F-measure shows more similarity between two modularizations:

Let mno denotes the number of move or join operations in which one modularization can be transformed to another. The MoJoFM between extracted modularization and original modularization F is calculated with the relationship shown in (5). A high value for MoJoFM shows more similarity between two modularizations:

To compare the overall results of TDHC against other tested algorithms in terms of precision/recall, F-measure, and MoJoFM, we utilized a nonparametric effect size statistic, namely, Cliff’s which is used to quantify the amount of difference between two algorithms.

With having different results of algorithms on different criteria, and considering all criteria, deciding which algorithm performs well is not easy. In such circumstances, multicriteria decision-making (MCDM) can be utilized [69]. This technique measures the performance of various algorithms and assigns to each algorithm a value between zero and one, where zero indicates the weakest performance and one indicates the best performance. To this end, let n and m denote the number of algorithms and the number of criteria, respectively. A matrix, called , is created, and then based on entropy, the efficiency of each algorithm is calculated. Algorithm 4 shows these steps.

Input: n: number of algorithms
  m: number of criteria
  X: a matrix with n × m dimension and xij is amount of algorithm i for criterion j
 Step 1:
 Step 2: calculate the entropy value, Hj, for criterion j and
 Step 3: calculate the importance and weight of each criterion
 Step 4: calculate the maximum and minimum vector of each criterion , , ,
 Step 5: calculating a positive and negative ideal distance from reality
 Step 6: calculating a positive and negative ideal distance for each algorithm ,
 Step 7: calculate the efficiency of each algorithm
 Step 8: select the best algorithm

5. Empirical Study Results

To compare and evaluate the proposed algorithm, five software systems with different domains and sizes have been selected. Also, seven folders with different functionalities have been selected from the Mozilla Firefox application.

To answer the research question RQ1, for comparison, in this paper, nine search-based algorithms with different characteristics including single objective, multiobjective, global search, local search, structured-based methods, and semantic-based approaches are chosen. The algorithms selected are Bunch-GA, DAGC, ECA, MCA, Bunch-SAHC, SGA, GA-SMCP, EoD, and SNDGA. The characteristics of these algorithms are described in Table 5. We, also, selected ACDC as a pattern-based algorithm for comparison. Several previous studies [911] have shown that ACDC routinely outperformed the others. Because ACDC is a pattern-based method, it produces the same clustering each time it is repeated, so the best and average results are always the same.


Algorithm# of objective usedSearch typeStructural-based/semantic-basedEncoding typeReference

Bunch-GASingle objectiveGlobal (GA)StructuralValue-based[5]
DAGCSingle objectiveGlobal (GA)StructuralPermutation-based[15]
ECAMultiobjectiveGlobal (two-archive GA)StructuralValue-based[12]
MCAMultiobjectiveGlobal (two-archive GA)StructuralValue-based[12]
Bunch-SAHCSingle objectiveLocal (hill climbing)StructuralValue-based[7, 21]
SGASingle objectiveGlobal (GA)SemanticValue-based[14]
GA-SMCPSingle objectiveGlobal (GA)StructuralValue-based[30]
EoDMultiobjectiveGlobal (estimation of distribution)Semantic and structuralValue-based[8]
SNDGASingle objectiveGlobal (GA)Semantic, nominal, and structuralValue-based[46]

The best and average results of TDHC on seven folders of Firefox folders and five other software systems are compared with the results of selected state-of-the-art algorithms with different features in terms of precision, recall, F-measure, and MoJoFM. The details are reported in Tables 69.


Folder nameBunch-GAECAMCADAGCBunch-SAHCSGAGA-SMCPEoDACDCSNDGATDHC
Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)

Browser70656052726845365248555070655249666668648078
Dom58555348544626253936454258555855838377757475
Accessible42393736403627242726555338364239424265617854
Extensions50465946534422212825353348465146777765628471
Gfx54506154675729294241575554506057737384818675
Ipc81818181818140388181818080798079616191906770
Intl80767575848141397575787671687571929286808979

Mini-Tunis80788076807371698071807859518073464680758467
JUnit 460586058605659516058605960576765636365606051
Servlet-API90889190888573687875716183809084898980719390
Easy mock78737569716061536351655478697873676760498791
Calculator40374138413533313831413840374139444462548579


Folder nameBunch-GAECAMCADAGCBunch-SAHCSGAGA-SMCPEoDACDCSNDGATDHC
Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)Best (%)Verage (%)

Browser83623332110096754928554879718476959556417358
Dom534127244443223321383349395144828275565645
Accessible27223148562321713454532202622232344364232
Extensions262626193528321210363619167572696939334330
Gfx2523262165610.950.882015332521192620606051405350
Ipc5029252526260.430.425028573445385548959552415850
Intl4532151549470.510.441715726545424032949426217169

Mini-Tunis81788179817869638172686581788180353571548455
JUnit 463616160616057515953484161596160545468614643
Servlet-API5047514983804845484148425047504910010048435572
Easy mock41366561605338303528413541334538575745404138
Calculator3531373537321410128292520174033202040341412


Folder nameBunch-GAECAMCADAGCBunch-SAHCSGAGA-SMCPEoDACDCSNDGATDHC
Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)

Browser65555146565170544236353255516157101056526049
Dom43404037383548442929313041334639424271675271
Accessible31232725292450372119262631254440303056516352
Extensions35305137392736312017181730242822989848455049
Gfx48396848624342403130423933284642252580727558
Ipc706970696968636268687272706569636684817385
Intl74726261757260606059717058447671949482748463

Mini-Tunis71637370717048396861605160527169929269527867
JUnit 448414845484128253122453938325350494950433431
Servlet-API736578747568514241305948484178724141756910098
Easy mock79728174817652453831807368618378323278709090
Calculator48405543615028192112504239355544525255498980


Folder nameBunch-GAECAMCADAGCBunch-SAHCSGAGA-SMCPEoDACDCSNDGATDHC
Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)Best (%)Average (%)

Browser7357403771671394130413965597165181855446652
Dom454032304038333124343145364841565669655438
Accessible272124203632641815413932223228262645405141
Extensions302834253528541512242323192824818141384635
Gfx3329372963411822420353026233327353560516251
Ipc5139373737370.830.845839604555486155121264546561
Intl56442424595710.873024736651435244969637337759

Mini-Tunis76707774767557487466645769627674515170538160
JUnit 454495451544938344131464047415755515158503936
Servlet-API59556259797449434435534549446158585859547183
Easy mock54487267696244363629544751435851414151515653
Calculator40354439464019131510373126234738292946402421

In Table 6, the TDHC has better performance in most cases, and the “dom” and “Intl” folders the ACDC algorithm has better results in best and average, respectively. Table 7 shows that, in terms of precision, MCA and ACDC have the best performance against other algorithms. In Table 8, the algorithms are compared in terms of recall in which TDHC has better performance in most cases. In Table 9, for F-measure, the TDHC and SNDGA perform almost the same.

From Tables 69, we conclude that DAGC, ECA, Bunch-SAHC, GHA, and GA-SMCP, compared to the other algorithms, systematically provide an extremely low precision/recall, F-measure, and MoJoFM. On the contrary, if we ignore the precision criterion, TDHC clearly seems to be among the best algorithms, always at the top. It often competes with ACDC, EoD, and SNDGA, which sometimes clearly outperform TDHC.

To exact and direct compare the results of the TDHC against other algorithms Cliff’s, is calculated for them which results are represented in Table 10. Cliff’s is a nonparametric effect size metric that quantifies the difference among two groups of observations (here TDHC against other tested algorithms). The result of this metric is in range −1 to 1, and higher value shows that results of the first group (here, TDHC) generally is better than the second group (other algorithms). To interpret, as in [10], the following magnitudes are used: negligible (), small (), medium (), and large (). The results indicate that the values for MoJoFM, precision, recall, and F-measure of TDHC output are better than the other algorithms in general.


Metric nameBunchECAMCADAGCBunch-SAHCSGAGA-SMCPEoDACDCSNDGA
BestAverageBestAverageBestAverageBestAverageBestAverageBestAverageBestAverageBestAverageBestAverageBestAverage

MoJoFM0.570.370.600.430.560.390.930.900.760.600.780.600.760.560.600.450.500.270.390.22
Precision0.220.260.420.31−0.01−0.240.510.530.440.410.510.290.240.200.07−0.02−0.14−0.260.190.12
Recall0.280.410.220.430.210.360.510.720.600.740.490.690.510.690.200.350.270.350.170.17
F-measure0.290.250.440.310.10−0.030.690.520.650.700.670.420.450.310.220.060.350.190.180.05

In addition to the above experiments, we use MCDM to compare the performance of the tested algorithms considering all criteria employed for experiments. Table 11 shows the modularization quality in TDHC is better than other tested algorithms in most cases with an acceptable difference. The numbers in Table 11 show the superiority of the algorithms. The proximity of the produced numbers to one indicates that the algorithm, in that case, performed better than the rest in most experiments and most criteria.


BrowserDomAccessibleExtensionsGfxIpcIntlMini-TunisJUnit 4Servlet-APIEasy mockCalculator

Bunch-GA0.033400000.2521000.438000
ECA00000000000.7230.01
MCA0.864100.195300.294000.0300.011300.01
DAGC0.048100000000000
Bunch-SAHC000000000000
SGA000000000000
GA-SMCP000000000000
EoD0.0334000.33100000.50000.1
ACDC00.82600.66100.01010.32700.19000
SNDGA00.7310.1500.4980.8100.0410.552000.127
TDHC0.52100.12510.4310.3910.8100.98500.6910.8210.802

To answer the research question RQ2, the genetic algorithm is a stochastic optimizer, and the results achieved may be different in each run. The results achieved by the algorithm for several independent runs are expected to be close enough to each other. Therefore, to answer RQ2, the proposed algorithm is executed 30 times for each case and the stability of the results is analyzed by the t-test statistical technique. To apply the t-test, the results are grouped into two groups with the same size, named G1 and G2, and then some descriptive and inferential statistics are extracted from them. According to [70], having 30 rows of data is enough to suppose that the distribution is normal. This is a critical condition to use the t-test for analyzing. But we also utilized the Wilcoxon-signed rank test [71] as a nonparametric statistical hypothesis test to check stability of the results without considering being in normal distribution.

The results are represented in Table 12. The three first columns show the average, the standard deviation, and the standard error between mean of the two groups, respectively, as descriptive statistics. The two last columns of the table show the output of the inferential statistics. Levene’s test is an inferential statistic for assessing the equality of variances for a variable calculated for two groups. If the value (sig. column in the table) is greater than some significance level (0.05 in our tests), the null hypothesis of equal variances cannot be rejected. This is also true for the Wilcoxon-signed rank test. Two columns of Table 12 refer to the results of independent two-sample t-test with equal sample sizes and equal variances (according to results of Levene’s test) on two randomly separated groups of TDHC results, and the last two lines are for the results of Wilcoxon-signed rank test if the data are not in normal distribution. All the values are greater than 0.05, which shows we cannot reject the null hypothesis of equal means. Hence, the results of the different tests are converging to an acceptable range.


Case studyDescriptive statisticsInferential statistics
MeanStandard deviationStandard error between meanLevene’s testt-testWilcoxon-signed rank test
G1G2G1G2G1G2FSig.TSig.ZSig.

Firefox-browser1.47141.55720.3370.2460.14720.10031.50440.25500.47020.651−0.730.465
Firefox-dom0.5730.6140.2470.3060.11750.13700.0350.85710.2810.8135−1.7530.08
Firefox-accessible0.52280.59420.12750.12850.57020.05740.0130.96700.8820.4036−1.4830.138
Firefox-extensions0.5530.6360.2120.23910.0.0960.1030.2680.61090.5270.530−0.6740.5
Firefox-gfx0.33440.26880.14680.12540.064560.056110.0540.8230.7600.469−1.4610.144
Firefox-ipc1.3061.4620.40810.4360.1790.1950.3480.5720.5210.5617−0.1350.893
Firefox-intl0.3470.3190.3640.12720.16280.04223.8470.0850.4650.654−1.0840.279
Mini-Tunis0.97261.01570.142560.140430.063760.062800.01810.0140.6740.643−0.1350.893
JUnit 40.79500.79230.007590.018310.003390.008190.47010.24810.4960.770−1.0690.285
Servlet-API2.52342.51340.053450.58710.023900.026261.0070.32770.6900.785−0.730.465
Easy mock1.95722.03600.297160.243790.132890.109030.7450.1270.9530.659−0.4050.686
Calculator2.30942.32100.166580.293730.074490.131360.8140.3780.1820.941−0.4050.686


CodeNameModuleCodeNameModule

20Main57File6
19User59FileTable6
16State54Directory6
2Control51Computer3
6Family514Memory3
8FileIO417System1
11INode415Panic1
12INodeGlobals45Disk2
13INodeTable418Tty2
10FreeINode43Device2

To answer the research question RQ3, the output of TDHC for Mini-Tunis is investigated. Mini-Tunis (mtunis) is an academic operating system with 20 artifacts numbered from 1 to 20 in Figure 12. Each artifact is a file, and all the artifacts with the same number in the parenthesis are in the same module [5]. According to the figure, artifact numbers 15 and 17 just are called by other artifacts and can be discarded in the preprocessing. Figure 13 shows the tree produced by the proposed algorithm for 18 remaining artifacts, and Figure 14 represents its corresponding hierarchical modularization in a flat view. The numbers between parentheses for each artifact are its module number in the expert modularization. As is shown in this figure, a new arrangement is proposed to the artifacts of module numbers 3 and 5 in the expert modularization, but they joined together in the upper level. The artifact numbers 15 and 17 have a relation with several of these modules. So they are identified as a new module in which the artifacts are utility libraries. These two artifacts are in a separate module in expert modularization too.

The most important advantages of this method are that it can perform very well in to specify the encapsulation levels, e.g., module, package, and component by the designer.

6. Threats to Validity

In this section, to clarify the validity of TDHC, the limitations that can affect the results of the algorithm are discussed. Several factors may bias the validity of the study. These are typically divided into two categories: external and internal validity. External validity is about the ability to generalize the results to other than used case studies or indifferent settings for them:(1)The input of the algorithm is an ADG extracted from source code, and cohesion and coupling are considered as an indicator for refactoring. Candela et al. in [1] discussed that cohesion and coupling are not enough to remodularization of source code, and more indicators are probably needed. However, they did not discuss in their work what other indicators could improve the quality of the modularization.(2)In search-based techniques for source code remodularization, generalizing a technique to any software is an important threat to the validity of results. So, in this paper, Mozilla Firefox as a large-scale software system is selected alongside five medium-size other open-source systems. It is important to note that there are just some software systems that have more than Mozilla Firefox artifacts (here files) in a folder.

Internal validity is concerned with experimental treatments that affect the algorithm results, leading to poor results:(1)In this paper, precision, recall, F-measure, and MoJoFM metrics have used to compare study results with current modularization algorithms. These metrics are not necessarily in line with the developer expert’s opinion. Also, these metrics do not evaluate the structure of the tree, and none of them consider edges between artifacts in calculating similarity.(2)In the preprocessing step of TDHC, some artifacts may be selected to set aside from input of the tree generation step. In the end, it is important to suggest an appropriate position in modules for them or aggregate them as a new module.(3)The related rate of crossover and mutation operators used in GA is achieved from several experiments on the Mini-Tunis, JUnit, and Servlet-API software systems and applied to other case studies. However, these numbers may not work well on other software systems.(4)In the proposed algorithm, labels of inner nodes are not important and the Prufer sequence generates the same modularization for different codes. For example, and both represent the same modularization. On the contrary, the concept of neighborhood in this encoding is not transparent, and small changes in number positions make a great change in the structure of the output tree.

7. Conclusion and Future Work

During software maintenance and evolution, the structure of the software deviates from its original structure. Thus, source code refactoring is an essential role in the software maintenance process. In this paper, a new clustering algorithm based on cohesion and coupling between artifacts is proposed. In this method, a top-down hierarchical approach has been used with a metaheuristic algorithm (combining genetic algorithm and hill-climbing). In the proposed algorithm, a suitable point to start modularization of artifacts is suggested for developers. The input of the algorithm is ADG, which is independent of the source code programming language. However, its prepossessing operations may depend on the programming language of the source code or the type of input artifacts (class, file, function, or low-level module). Because the proposed refactoring method is automatic, it is supposed to serve as an assistant to the developer. Design decisions are often more complex and subtle than just trying to maximize cohesion and minimize coupling in the modularization process. In outcome, the derived modularization is analyzed by the software developer who can accept the proposed remodularization as is or change it by moving artifacts from one module (package) to another. The following is suggested for future works:(1)Increasing number of artifacts affect the quality of the optimal solution proposed by the algorithm. It is because of the exponential growth of the search space by increasing the input size. So it is important to improve the algorithm factors.(2)According to the size of search space in the software source code, a new preprocessing method can be offered to reduce search space. For example, in software source code refactoring, there is a modularization as a current developer suggests, and artifacts in a module are usually closely in contact with each other on that module and only some of them are in relation to other modules. Therefore, they can be ignored in calculating the relationship between modules.(3)Many research studies use structured or nonstructured features for refactoring of source code that can be used in this top-down search-based algorithm too.(4)Other heuristic or metaheuristic algorithms can be used instead of GA.

Data Availability

The data used to support the findings of this study are available at https://github.com/SoftwareMaintenanceLab.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

  1. I. Candela, G. Bavota, B. Russo, and R. Oliveto, “Using cohesion and coupling for software remodularization,” ACM Transactions on Software Engineering and Methodology, vol. 25, no. 3, p. 1, 2016. View at: Publisher Site | Google Scholar
  2. E. V. de Paulo Sobrinho, A. De Lucia, and M. de Almeida Maia, “A systematic literature review on bad smells-5 w’s: which, when, what, who, where,” IEEE Transactions on Software Engineering, vol. 2, 2018. View at: Google Scholar
  3. M. Fowler, K. Beck, J. Brant, W. Opdyke, and D. Roberts, “Refactoring: improving the design of existing programs,” 1999. View at: Google Scholar
  4. A. Isazadeh, H. Izadkhah, and I. Elgedawy, Source Code Modularization: Theory and Techniques, Springer, Berlin, Germany, 2017.
  5. S. M. Brian, A Heuristic Search Approach to Solving the Software Clustering Problem, Drexel University, New York, NY, USA, 2002.
  6. R. S. Pressman, Software Engineering: A Practitioner’s Approach, Palgrave Macmillan, Berlin, Germany, 2005.
  7. B. S. Mitchell and S. Mancoridis, “On the automatic modularization of software systems using the bunch tool,” IEEE Transactions on Software Engineering, vol. 32, no. 3, pp. 193–208, 2006. View at: Publisher Site | Google Scholar
  8. N. S. Jalali, H. Izadkhah, and S. Lotfi, “Multi-objective search-based software modularization: structural and non-structural features,” Soft Computing, vol. 23, no. 21, pp. 11141–11165, 2019. View at: Google Scholar
  9. T. Lutellier, D. Chollak, J. Garcia et al., “Comparing software architecture recovery techniques using accurate dependencies,” in Proceedings of the 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, pp. 69–78, IEEE, Berlin, Germany, 2015. View at: Google Scholar
  10. T. Lutellier, D. Chollak, J. Garcia et al., “Measuring the impact of code dependencies on software architecture recovery techniques,” IEEE Transactions on Software Engineering, vol. 44, no. 2, pp. 159–181, 2017. View at: Google Scholar
  11. J. Garcia, I. Ivkovic, and N. Medvidovic, “A comparative analysis of software architecture recovery techniques,” in Proceedings of the 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 486–496, IEEE, New York, NY, USA, 2013. View at: Google Scholar
  12. K. Praditwong, M. Harman, and X. Yao, “Software module clustering as a multi-objective search problem,” IEEE Transactions on Software Engineering, vol. 37, no. 2, pp. 264–282, 2011. View at: Publisher Site | Google Scholar
  13. K. Kohmoto, K. Katayama, and H. Narihisa, “Performance of a genetic algorithm for the graph partitioning problem,” Mathematical and Computer Modelling, vol. 38, no. 11–13, pp. 1325–1332, 2003. View at: Publisher Site | Google Scholar
  14. M. Kargar, A. Isazadeh, and H. Izadkhah, “Multi-programming language software systems modularization,” Computers & Electrical Engineering, vol. 80, p. 106500, 2019. View at: Publisher Site | Google Scholar
  15. S. Parsa and O. Bushehrian, “A new encoding scheme and a framework to investigate genetic clustering algorithms,” Journal of Research and Practice in Information Technology, vol. 37, no. 1, p. 127, 2005. View at: Google Scholar
  16. H. Izadkhah, I. Elgedawy, and A. Isazadeh, “E-cdgm: an evolutionary call-dependency graph modularization approach for software systems,” Cybernetics and Information Technologies, vol. 16, no. 3, pp. 70–90, 2016. View at: Publisher Site | Google Scholar
  17. T. Mens and T. Tourwé, “A survey of software refactoring,” IEEE Transactions on Software Engineering, vol. 30, no. 2, pp. 126–139, 2004. View at: Publisher Site | Google Scholar
  18. S. Ducasse and D. Pollet, “Software architecture reconstruction: a process-oriented taxonomy,” IEEE Transactions on Software Engineering, vol. 35, no. 4, pp. 573–591, 2009. View at: Publisher Site | Google Scholar
  19. F. Palomba, A. De Lucia, G. Bavota, and R. Oliveto, “Anti-pattern detection: methods, challenges, and open issues,” in Advances in Computers, pp. 201–238, Elsevier, Berlin, Germany, 2014. View at: Google Scholar
  20. J. Al Dallal and A. Abdin, “Empirical evaluation of the impact of object-oriented code refactoring on quality attributes: a systematic literature review,” IEEE Transactions on Software Engineering, vol. 44, no. 1, pp. 44–69, 2018. View at: Publisher Site | Google Scholar
  21. B. S. Mitchell and S. Mancoridis, “On the evaluation of the bunch search-based software modularization algorithm,” Soft Computing, vol. 12, no. 1, pp. 77–93, 2008. View at: Google Scholar
  22. M. Tajgardan, H. Izadkhah, and S. Lotfi, “Software systems clustering using estimation of distribution approach,” Journal of Applied Computer Science Methods, vol. 8, no. 2, pp. 99–113, 2016. View at: Publisher Site | Google Scholar
  23. Amarjeet and J. K. Chhabra, “Many-objective artificial bee colony algorithm for large-scale software module clustering problem,” Soft Computing, vol. 22, no. 19, pp. 6341–6361, 2018. View at: Publisher Site | Google Scholar
  24. A. Prajapati and J. K. Chhabra, “A particle swarm optimization-based heuristic for software module clustering problem,” Arabian Journal for Science and Engineering, vol. 43, no. 12, pp. 7083–7094, 2018. View at: Publisher Site | Google Scholar
  25. M. Harman and L. Tratt, “Pareto optimal search based refactoring at the design level,” in Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, pp. 1106–1113, ACM, London, UK, 2007. View at: Google Scholar
  26. L. C. Briand, J. W. Daly, and J. K. Wust, “A unified framework for coupling measurement in object-oriented systems,” IEEE Transactions on Software Engineering, vol. 25, no. 1, pp. 91–121, 1999. View at: Publisher Site | Google Scholar
  27. O. Seng, J. Stammel, and D. Burkhart, “Search-based determination of refactorings for improving the class structure of object-oriented systems,” in Proceedings of the 8th Annual Conference on Genetic And Evolutionary Computation, pp. 1909–1916, ACM, London, UK, 2006. View at: Google Scholar
  28. S. Kebir, I. Borne, and D. Meslati, “A genetic algorithm-based approach for automated refactoring of component-based software,” Information and Software Technology, vol. 88, pp. 17–36, 2017. View at: Publisher Site | Google Scholar
  29. A. C. Kumari, K. Srinivas, and M. Gupta, “Software module clustering using a hyper-heuristic based multi-objective genetic algorithm,” in Proceedings of the 2013 3rd IEEE International Advance Computing Conference (IACC), pp. 813–818, IEEE, New York, NY, USA, 2013. View at: Google Scholar
  30. J. Huang and J. Liu, “A similarity-based modularization quality measure for software module clustering problems,” Information Sciences, vol. 342, pp. 96–110, 2016. View at: Publisher Site | Google Scholar
  31. G. Bavota, A. De Lucia, A. Marcus, and R. Oliveto, “Using structural and semantic measures to improve software modularization,” Empirical Software Engineering, vol. 18, no. 5, pp. 901–932, 2013. View at: Publisher Site | Google Scholar
  32. G. Bavota, M. Gethers, R. Oliveto, D. Poshyvanyk, and A. d. Lucia, “Improving software modularization via automated analysis of latent topics and dependencies,” ACM Transactions on Software Engineering and Methodology, vol. 23, no. 1, p. 1, 2014. View at: Publisher Site | Google Scholar
  33. G. Bavota, A. De Lucia, A. Marcus, and R. Oliveto, “Automating extract class refactoring: an improved method and its evaluation,” Empirical Software Engineering, vol. 19, no. 6, pp. 1617–1664, 2014. View at: Publisher Site | Google Scholar
  34. G. Gui and P. D. Scott, “Coupling and cohesion measures for evaluation of component reusability,” in Proceedings of the 2006 International Workshop on Mining Software Repositories, pp. 18–21, ACM, Berlin, Germany, 2006. View at: Google Scholar
  35. G. Bavota, A. De Lucia, and R. Oliveto, “Identifying extract class refactoring opportunities using structural and semantic cohesion measures,” Journal of Systems and Software, vol. 84, no. 3, pp. 397–414, 2011. View at: Publisher Site | Google Scholar
  36. D. Poshyvanyk, A. Marcus, R. Ferenc, and T. Gyimóthy, “Using information retrieval based coupling measures for impact analysis,” Empirical Software Engineering, vol. 14, no. 1, pp. 5–32, 2009. View at: Google Scholar
  37. G. Bavota, R. Oliveto, M. Gethers, D. Poshyvanyk, and A. De Lucia, “Methodbook: recommending move method refactorings via relational topic models,” IEEE Transactions on Software Engineering, vol. 40, no. 7, pp. 671–694, 2014. View at: Publisher Site | Google Scholar
  38. J. I. Maletic and A. Marcus, “Supporting program comprehension using semantic and structural information,” in Proceedings of the 23rd International Conference on Software Engineering, pp. 103–112, IEEE Computer Society, Berlin, Germany, 2001. View at: Google Scholar
  39. F. Palomba, A. Panichella, A. De Lucia, R. Oliveto, and A. Zaidman, “A textual-based technique for smell detection,” in Proceedings of the 2016 IEEE 24th International Conference on Program Comprehension (ICPC), pp. 1–10, IEEE, Berlin, Germany, 2016. View at: Google Scholar
  40. S. Mohammadi and H. Izadkhah, “A new algorithm for software clustering considering the knowledge of dependency between artifacts in the source code,” Information and Software Technology, vol. 105, pp. 252–256, 2019. View at: Publisher Site | Google Scholar
  41. R. Mahouachi, “Search-based cost-effective software remodularization,” Journal of Computer Science and Technology, vol. 33, no. 6, pp. 1320–1336, 2018. View at: Publisher Site | Google Scholar
  42. K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: nsga-ii,” IEEE Transactions on Evolutionary Computation, vol. 6, no. 2, pp. 182–197, 2002. View at: Publisher Site | Google Scholar
  43. A. Ouni, M. Kessentini, M. Ó Cinnéide, H. Sahraoui, K. Deb, and K. Inoue, “More: a multi-objective refactoring recommendation approach to introducing design patterns and fixing code smells,” Journal of Software: Evolution and Process, vol. 29, no. 5, p. e1843, 2017. View at: Publisher Site | Google Scholar
  44. J. Al Dallal, “Predicting move method refactoring opportunities in object-oriented code,” Information and Software Technology, vol. 92, pp. 105–120, 2017. View at: Publisher Site | Google Scholar
  45. L. Mu, V. Sugumaran, and F. Wang, “A hybrid genetic algorithm for software architecture re-modularization,” Information Systems Frontiers, vol. 92, pp. 1–29, 2019. View at: Google Scholar
  46. M. Kargar, A. Isazadeh, and H. Izadkhah, “Improving the modularization quality of heterogeneous multi-programming software systems by unifying structural and semantic concepts,” The Journal of Supercomputing, vol. 76, no. 1, pp. 87–121, 2020. View at: Publisher Site | Google Scholar
  47. K. Mahdavi, A Clustering Genetic Algorithm for Software Modularisation with a Multiple Hill Climbing Approach, Brunel University, Berlin, Germany, 2005.
  48. M. C. Monçores, A. C. F. Alvim, and M. O. Barros, “Large neighborhood search applied to the software module clustering problem,” Computers & Operations Research, vol. 91, pp. 92–111, 2018. View at: Publisher Site | Google Scholar
  49. M. Kargar, A. Isazadeh, and H. Izadkhah, “Semantic-based software clustering using hill climbing,” in Proceedings of the 2017 International Symposium on Computer Science and Software Engineering Conference (CSSE), pp. 55–60, IEEE, London, UK, 2017. View at: Google Scholar
  50. J. Huang, J. Liu, and X. Yao, “A multi-agent evolutionary algorithm for software module clustering problems,” Soft Computing, vol. 21, no. 12, pp. 3415–3428, 2017. View at: Publisher Site | Google Scholar
  51. J. K. Chhabra, “Harmony search based remodularization for object-oriented software systems,” Computer Languages, Systems & Structures, vol. 47, pp. 153–169, 2017. View at: Google Scholar
  52. A. C. Kumari and K. Srinivas, “Hyper-heuristic approach for multi-objective software module clustering,” Journal of Systems and Software, vol. 117, pp. 384–401, 2016. View at: Publisher Site | Google Scholar
  53. A. Prajapati and J. K. Chhabra, “An efficient scheme for candidate solutions of search-based multi-objective software remodularization,” in Proceedings of the International Conference on Human Interface and the Management of Information, pp. 296–307, Springer, London, UK, 2016. View at: Google Scholar
  54. J. Hwa, S. Yoo, Y.-S. Seo, and D.-H. Bae, “Search-based approaches for software module clustering based on multiple relationship factors,” International Journal of Software Engineering and Knowledge Engineering, vol. 27, no. 7, pp. 1033–1062, 2017. View at: Publisher Site | Google Scholar
  55. A. Ramírez, J. R. Romero, and S. Ventura, “Interactive multi-objective evolutionary optimization of software architectures,” Information Sciences, vol. 463-464, pp. 92–109, 2018. View at: Publisher Site | Google Scholar
  56. M. Akbari and H. Izadkhah, “Hybrid of genetic algorithm and krill herd for software clustering problem,” in Proceedings of the 2019 5th Conference on Knowledge Based Engineering And Innovation (KBEI), pp. 565–570, IEEE, London, UK, 2019. View at: Google Scholar
  57. J. K. Chhabra, “Many-objective artificial bee colony algorithm for large-scale software module clustering problem,” Soft Computing, vol. 22, no. 19, pp. 6341–6361, 2018. View at: Google Scholar
  58. H. Izadkhah and M. Tajgardan, “Information theoretic objective function for genetic software clustering,” Multidisciplinary Digital Publishing Institute Proceedings, vol. 46, no. 1, p. 18, 2019. View at: Google Scholar
  59. O. Maqbool and H. Babri, “Hierarchical clustering for software architecture recovery,” IEEE Transactions on Software Engineering, vol. 33, no. 11, p. 759, 2007. View at: Publisher Site | Google Scholar
  60. A. Kuhn, S. Ducasse, and T. Gîrba, “Semantic clustering: identifying topics in source code,” Information and Software Technology, vol. 49, no. 3, pp. 230–243, 2007. View at: Publisher Site | Google Scholar
  61. P. Andritsos and V. Tzerpos, “Information-theoretic software clustering,” IEEE Transactions on Software Engineering, vol. 31, no. 2, pp. 150–165, 2005. View at: Publisher Site | Google Scholar
  62. A. Rathee and J. K. Chhabra, “Clustering for software remodularization by using structural, conceptual and evolutionary features,” Journal of Universal Computer Science, vol. 24, no. 12, pp. 1731–1757, 2018. View at: Google Scholar
  63. A. Shokoufandeh, S. Mancoridis, and M. Maycock, “Applying spectral methods to software clustering,” in Proceedings of the Ninth Working Conference on Reverse Engineering, pp. 3–10, Berlin, Germany, 2002. View at: Google Scholar
  64. V. Tzerpos and R. C. Holt, “Acdc: an algorithm for comprehension-driven clustering,” in Proceedings of the Proceedings Seventh Working Conference on Reverse Engineering, pp. 258–267, IEEE, Berlin, Germany, 2000. View at: Google Scholar
  65. J. Gottlieb, B. A. Julstrom, G. R. Raidl, and F. Rothlauf, “Prüfer numbers: a poor representation of spanning trees for evolutionary search,” in Proceedings of the 3rd Annual Conference on Genetic and Evolutionary Computation, pp. 343–350, Morgan Kaufmann Publishers Inc., Berlin, Germany, 2001. View at: Google Scholar
  66. I. Oliver, D. Smith, and J. R. Holland, “Study of permutation crossover operators on the traveling salesman problem,” in Proceedings of the Second International Conference on Genetic Algorithms and their Applications, L. Erlhaum Associates, Hillsdale, NJ, USA, 1987. View at: Google Scholar
  67. J. Wu, A. E. Hassan, and R. C. Holt, “Comparison of clustering algorithms in the context of software evolution,” in Proceedings of the 21st IEEE International Conference on Software Maintenance (ICSM’05), pp. 525–535, IEEE, Hillsdale, NJ, USA, 2005. View at: Google Scholar
  68. Z. Wen and V. Tzerpos, “An effectiveness measure for software clustering algorithms,” in Proceedings of the 12th IEEE International Workshop on Program Comprehension, pp. 194–203, IEEE, New Jersey, NJ, USA, 2004. View at: Google Scholar
  69. Z.-P. Tian, H.-Y. Zhang, J. Wang, J.-Q. Wang, and X.-H. Chen, “Multi-criteria decision-making method based on a cross-entropy with interval neutrosophic sets,” International Journal of Systems Science, vol. 47, no. 15, pp. 3598–3608, 2016. View at: Publisher Site | Google Scholar
  70. J. Cohen, “Things i have learned (so far),” in Proceedings of the 98th Annual Convention of the American Psychological Association Presented at the Aforementioned Conference, American Psychological Association, Boston, MA, USA, 1990. View at: Google Scholar
  71. F. Wilcoxon, S. Katti, and R. A. Wilcox, “Critical values and probability levels for the wilcoxon rank sum test and the wilcoxon signed rank test,” Selected Tables in Mathematical Statistics, vol. 1, pp. 171–259, 1970. View at: Google Scholar

Copyright © 2020 Masoud Aghdasifam et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


More related articles

 PDF Download Citation Citation
 Download other formatsMore
 Order printed copiesOrder
Views409
Downloads271
Citations

Related articles

Article of the Year Award: Outstanding research contributions of 2020, as selected by our Chief Editors. Read the winning articles.