#### Abstract

Aiming at the common problems of quality evaluation method, this paper first establishes a fuzzy software quality evaluation model according to the relationship of software quality subcharacteristics and indicators; furthermore, considering the uncertainty and individual deviations of expert judgment results, this paper corrects and tests the consistency of the incomplete information sorting given by the experts and obtains an integration sorting of gathering different expert opinions through the idea of circling modification; at last, this paper proposes the weighted mutation rate which is used to measure the development balance degree and determines weights of evaluation indicators via weighted mutation rate correction incompletion G1 method, which avoids the problem of integration of subjective and objective weights.

#### 1. Introduction

With the increasing popularity of software products, the market requires high software product quality continuously. How to evaluate the quality of software has become the most concerned problem to users and managers of software organizations, because the evaluation results of software quality can not only guide the users to purchase and use the software, but also guide software developers to develop high-quality software products [1–3].

Previous studies have shown that, due to the characteristics of the software itself and the limitation of public cognitive level, software quality evaluations are always vague and uncertain. Just because of the fuzzy characteristics of software quality, domestic and overseas scholars began to use fuzzy comprehensive evaluation method to solve some core issues of software quality evaluation [4–8]. In the fuzzy comprehensive evaluation method, membership function construction is a quite difficult and complex process, and the construct of the weight is not reasonable. In the construction of membership function, in order to meet the continuous and gradient characteristics of quality, semitrapezoid distribution and trapezoidal distribution were currently used [6–8]. As for the weighting methods, current studies focused on the following aspects.(a)Subjective weighting method: the analytic hierarchy process (AHP) and the order relation analysis method (G1) [9] are the representative of the weighting method, the characteristic of which is the index weight information which came from the subjective experience judgment of experts.(b)Objective weighting method: it refers to the weighting method taking entropy method [10], establishing mathematical programming models [11–13] as the representative methods, which are characterized by the index weight information which came from data information of Index, but are affected by subjective factors. It is also the inadequacies of objective weighting method.(c)Combination of objective and subjective weighting methods: the main representatives are addition synthesis [9] and multiplication synthesis [11–13], but we do not know that whether addition and multiplication methods are reasonable. Literature [14] proposed the combination weights based on standard deviation correction G1 combination weight model, which guaranteed that the combination weights could reflect the expert advices and data information, and only required each expert to give the importance sorting of all indicators without considering the possible incomplete information phenomenon of expert sorting.

Most of the above combination weighting methods require evaluation experts to consider the same set of evaluation index and give all the evaluation information to each expert. However, in some realistic evaluation of software quality, due to the influence of the knowledge and the experience of each expert, the understanding and cognition of evaluation criteria, psychological scale, and environmental factors, there exist uncertainty, individual judgment bias, and incomplete information in the evaluation result of each expert. In order to eliminate the deviation of individual evaluation results and uncertain information and reduce the impact of randomness and contingency of individual evaluation results, group evaluation should be used. After that, correct uncertain information to increase the effective number of expert evaluation results and conduct the consistency testing [15] or the accuracy analysis [16] of all experts evaluation results, discard the expert evaluation results that do not pass the inspection, finally integrate into the ideal evaluation information. According to the perspective of statistical significance, the more experts, the less dispersion degree of individual evaluation results, the closer group evaluation summary results to the true value of the evaluation object. However, the number of experts is often limited by some factors like evaluation period and the costs in the evaluation practices; sometimes even peer experts themselves in the strict sense are always limited, which is difficult to satisfy the need of expert number. Practice shows that when the number of experts is close to 15, the influence of further increasing number of experts to evaluation accuracy has already turned out to be tiny [16, 17]. Therefore, the number of experts in the evaluation is usually 7 to 15.

According to ISO/IEC 9126 standard, this paper confirms the evaluation index used to measure software quality and then establish a complete set of index systems used as the foundation of software quality evaluation. Design an effective software quality evaluation method, conduct fuzzy processing of index evaluation criterion by the method of constructing membership function in fuzzy mathematics, then obtain, and quantify the fuzzy quality of indicators. For fully considering the different preferences of different experts, this paper corrects the incomplete information meeting certain conditions given by some experts and finally obtains results that passed the consistency test through integrating with the order relation of complete evaluation given by experts, and the integrated order relation had the highest level of consistency. Then propose a weighted mutation rate formula measuring the degree of development difference, by which correct incomplete information G1, and obtain combination weights based on weighted mutation rate correction incompletion G1. The key to this method is how to construct the membership function of software index, how to deal with different preferences of different experts, and how to integrate weights information with subjective and objective factors properly.

#### 2. Software Evaluation Index System

##### 2.1. Software Quality Index System Decomposition

ISO/IEC 9126 standard [18] is a current popular international standard on software metrics and software quality evaluation. ISO/IEC 9126 indicates software quality with six quality characteristics, which are the functionality, reliability, usability, maintainability, efficiency, and portability. Each quality characteristic is indicated with quality subcharacteristic, for example, subcharacteristics of functionality are suitability, accuracy, interoperability, security, and compliance. Each subcharacteristic also contains some relevant measure indexes. Characteristics, subcharacteristics and measurable indexes constitute the three-stage index model of software quality.

The definition, method, and standard of measure in ISO/IEC 9126 standard are general concepts. In order to make the evaluation more visualized and operable, decompose measure in the standard into indicators and data items, formed evaluation index system of software quality, as shown in Figure 1. Each index contains* n* data items, the value of which is collected from the process software testing, substituting the value of data items into index formula and getting the indicator value. For example, suitability of subcharacteristic contains four indicators: functional attainment ratio, functional specification change ratio, precise input-output definition ratio, and project documentation ratio. Table 1 is the actual example of functional attainment ratio index [7, 19].

##### 2.2. Measurement and Quantification Based on the Index System

One of the key issues of software quality evaluation is to determine the quality evaluation indicators and the common method is as follows: firstly, determine the evaluation standard of indicators and divide the quality of indicators into four grades (Poor, Average, Good, and Excellent); secondly, determine the index threshold through expertise method , , , , and ; that is to say, define the value range of indicator on each level as , , , , and ; Finally, obtain the quality of index through comparing the value with threshold of indicators [7]. In the above index quality also exist the following shortcomings in the evaluation: (a) disagreement with the continuous and gradient characteristics of quality; (b) losing the evaluation information; (c) poor quantification degree.

The above problems can be solved by fuzzing the evaluation standards of index through constructing the membership function of index quality on all grades. The paper uses semitrapezoid distribution and trapezoidal distribution [20, 21] to construct the membership function of quality grade. Set the domain of discourse , and* A* is a fuzzy subset of indicator on the domain of discourse. Constructed membership functions, respectively, are
of which , , , and are central values of intervals , , , and .

For example, with regard to the functional attainment ratio in Table 1, assume its threshold value obtained by expertise method [8] as , then , , , , and , which is substituted into (1) and obtain the membership function of indicator on each quality grade. If we substitute the measured indicator value into the membership degree equation, then obtain that the membership degrees of the indicator are , , , and . So the fuzzy quality of the indicator is

The normalized quality , calculating the membership degree on different quality grades is a good solution of unclear boundary among the quality grades.

In order to facilitate the comparison of similar software products, the degree of excellent, good, average, and poor can be further quantized. Set is the corresponding score set of the remark set {excellent, good, average, poor}, that is, the score of excellent, good, average, and poor are 4, 3, 2, and 1, respectively. Set to represent the quantized quality of the th indicator of th evaluated object and to represent the fuzzy quality of the th indicator of th evaluated object; then the quantitative score [8] of this index of specific is

#### 3. Correction Incompletion Group G1 Combination Weights Model

##### 3.1. Difficulty and Solution

*(**1) Problems of Expert Group Decision Making.*

*First Problem*. As the evaluation experts of different fields often come from different organizations and sectors, and each expert has different knowledge and experience, disagreement always emerges when experts compare the degree of importance of two indicators. For one expert, it is easier to determine which one is more important from two important indicators. But there are often conflict when different experts compare the same two indicators. In a comprehensive evaluation with a number of experts and multiple indicators, pair-wise comparison of different indicators from different experts is involved, which is apparently more trouble and more inconsistent.

*Second Problem*. It is more prone to have different opinions when multiple experts directly determine the value ratio of how many times one indicator is more important than the other indicator. Obviously, in a comprehensive evaluation with numerous experts and multiple indicators, it is harder to directly determine the value ratio of how many times one indicator is more important than the other indicator, rather than to determine which indicator is important.

*(**2) Solutions.* First Thought. Each expert would be required to sort the importance of indicator. Considering the indicator omission selected by experts according to the actual situation, we integrate expert’s incomplete sorting which meets the integration condition and construct a complete information indicator set. Then we unify the sorting of importance degree of two indicators in different expert indicator sets by consistency testing and cycling modification, obtaining a unified sorting that gathers from different experts’ opinions.

*Second Thought*. On the basis of first thought, we could determine the ratio of importance degree of two indicators through comparing their weighted mutation rate (see Section 3.2.3). It was easier to sort two indicators, while it is more difficult to determine the value of importance degree, which tends to cause conflicts, so that we choose to avoid the latter to avoid the contradiction.

##### 3.2. Establishment of G1 Combination Weights Model of Weighted Mutation Rate Correction Incompletion

###### 3.2.1. Introduction of Traditional Order Relation Analysis Method (G1)

The order relation analysis method (G1) is a typical subjective weighting method. In this method, the weight information of index is all from the subjective experience of expert and each expert is asked to evaluate all indicators, without considering the expert’s preference or reflecting the data information of indicators. Specific steps are as follows.

*(**1) Weight of Index Layer to Criterion Layer*(1)Experts determine the order relation of indicators.(2)Experts give rational assignment of importance degree ratio of adjacent indicators and . Rational assignment is shown in Table 2.(3)According to value given by experts, the G1 weight of th indicator on criterion layer is as follows:
(4)Weights of the indicators can be obtained from weight :
of which represents the weight of th indicator to the layer on the criterion layer.

*(**2) Weight of Index Layer to General Objective Layer*. Set as the weight of th indicator to general objective layer on th criterion layer; is the weight of th indicator to th criterion layer on th criterion layer; is the weight of th criterion layer to general objective layer. Then the weight of the indicator to general objective layer is

###### 3.2.2. Weighted Mutation Rate Correction Incompletion G1 Combination Weights

In the current study, we generally use linear weighting as the integration method of subjective and objective weights. Based on weighted mutation rate, this paper proposed a nonlinear weighting method. We construct the correction incompletion G1 combination weights model based on weighted mutation rate to determine the combination weight from a new perspective. We not only consider experts’ preferences and reflect experts’ opinions, but also include objective data information. Detailed steps are as follows.

Each expert evaluated the order of importance of indicators in evaluation set and then corrected the incomplete information order relation from some experts, obtaining a corrected complete sorting. Then we implemented consistency testing and information integration with the order relation given by other experts, in order to get an ideal sorting, which can take experts’ preferences into account and reflect experts’ opinions; see Sections 3.2.4 and 3.2.5.

Determine the importance degree ratio of adjacent indicators and by calculating the weighted index mutation rate (see Section 3.2.3), which can reflect the data information of indicators and illustrate the objectivity of this method. Consider Other steps were the same as traditional G1 method; see Section 3.2.1.

###### 3.2.3. Definition of Weighted Mutation Rate and Its Metric Property

To measure the degree of difference among a set of data, this paper proposed a new measurement method, weighted mutation rate. Weighted mutation rate is a method where we obtain data information variation based on objective data. The degree of data difference can be directly judged from the weighted mutation rate.

Define the weighted mutation rate as of which, was the th indicator value of th evaluated objected, was the th mean of indicator, was the percentage of indicator data, and indicated the weighted standard error of th indicator.

It is easy to prove that weighted mutation percentage rate has good metric quality. Simplify formula (8): Then simplify formula (9): As , if , so formula (10) could be simplified as From formula (11), we knew that . And formula (11) was greater than or equal to 0. That means that . Therefore, the greater weighted mutation rate is, the higher the degree of development difference among the various indicators is. It is easy to prove that is the necessary and sufficient conditions for absolutely equal development among various indicators. That is . When , , , ; Otherwise, if , it’s easy to prove that , , . That means that was the necessary and sufficient conditions that had absolute differences among the development of various indicators. That is, most indicator items had no progress; only one indicator developed. It was clear that weighted mutation rate had the similar value and meaning as Gini coefficient [22, 23].

###### 3.2.4. Correction Method of Expert Incomplete Information Order Relation

In order to reduce the degree of dispersion of individual evaluation and improve the objectivity of the assessment, selecting experts who possess intelligence and virtue was the key to perfect peer review. Moreover, in order to fully and objectively assess the value of the object, the structural composition of the experts in group evaluation should be representative. That means that the experts should come from different regions and different work units and graduate from different schools, and sometimes they even need to be representative in the industry, subject area, academic schools, and other aspects, so as to offset the individual deviation from each other while calculating the summarized results in group evaluation.

Choose experts to rank indicators, assuming experts give the complete sorting, while there were varying degrees of incomplete indicators in the other sorting of experts. Specific correction steps were as follows.

* 1.* Expert classification and modification condition to classify the experts giving the identical index number as one category. If every indicator in this expert sorting was chosen by different expert, that is, each indicator in the category was chosen by at least one expert, then we could conduct the information modification and consistency testing on the incomplete information of expert. Otherwise we would reject the category of expert sorting.

* 2.* Employ the sorting and scoring method [15] to convert the incomplete information given by each expert into scores:

, , was the location of th indicator in th sorting of expert. If there is parallel sorting, there was a need to normalize the results of expert, conducting “skip” processing to the following evaluated objects of the parallel sorting. Then we adjusted the same number to the mean of corresponding sequence number. For example, the sorting result of an expert was , then the normalized sequence number would be .

* 3.* Calculate the mean value of indicators in different methods:

Use the mean of indicators as the score of missing indicators. Then insert them into the sorting of experts according to the numerical size. We would obtain a complete index order relation of each expert. If there was the same mean of two indicators, we should calculate the weighted mutation rate of the indicator score under different expert order relations, in which the small one was superior.

* 4.* For corrected complete indicator order relation of different experts, the results may vary, but the difference should not be too large for the same indicators. Thence, we need to conduct the consistency testing based on Spearman rank correlation coefficient on sorting, which would be discarded if it failed the consistency testing. Set the sorting from th method as ; then the formula of Spearman rank correlation coefficient [15] of th and th sorting was

Discard criteria are as follows. Calculate the Spearman rank correlation coefficient of one expert sorting and other expert sorting, when the mean value was greater than or equal to 0.7, it met the reliability condition and passed consistency testing. Namely, the expert sorting converted to effective sorting, which was equivalent to a complete index order relation. Otherwise, discard the expert sorting.

###### 3.2.5. Establishment of Ideal Sorting

Conduct the consistency testing by formula (14) on the above expert correction sorting that passed the consistency testing and complete sortings of expert order relation. And then score the expert sorting that passed the consistency testing by formula (12), if was the score of the th indicator of the th expert order relation, was the synthesis mean score of the th indicator; then

We would get the integration order relation by reordering the order relation by size. From Theorem 1, integration order relation not only satisfied the condition of Spearman consistency testing, but also achieved the optimal degree of consistency. Therefore, integration order relation was the ideal sorting.

Theorem 1. *If different order relation aiming at the same indicators satisfied the consistency condition of Spearman rank correlation coefficient, the integrated order relation A from the above also satisfied the consistency condition of Spearman rank correlation coefficient and even had the optimal degree of consistency.*

The following was the proof process of Theorem 1. Start with reviewing a lemma.

Lemma 2. *If , were the mean of square and the square of mean respectively, then
**
If and only if , both were equal.*

*Proof. * represented the location of the th indicator by the th expert, was the mean of integration order relation and the Spearman rank correlation coefficient ranked by experts, represented the mean of the th expert order relation and Spearman rank correlation coefficient ranked by other experts.

By (14), the mean of Spearman rank correlation coefficient can be deduced:
Then the mean of the integration order relation as follows:

Contrasted the consistency degree of the integration order relation and expert sorting:

Based on Lemma 2, , and other items were greater than or equal to 0; then ; namely, the mean of Spearman rank correlation coefficient of the integration order relation is optimal.

Some scholars [24–26] got the comprehensive sorting by weighting the expert consistency degree, where the greater the weight is, the greater the degree of consistency is. The integration sorting obtained by this weighted method was certain to satisfy the consistency condition, which could be proved by the above method, but not necessarily the best sorting, and the consistency degree of the method proposed by this paper was higher than the weighted method.

##### 3.3. Advantage of Weighted Mutation Rate Correction Incompletion G1 Method

In conclusion, the weighted mutation rate correction incompletion G1 combination weights have the following advantages comparing with other weighting methods.(1)In the evaluation process, we can fully consider the preferences of the experts and modify the possible incomplete information of experts. Then we construct a correction method based on incomplete information. Finally, we integrate into the ideal sorting with complete information sorting of other experts. This method makes full use of the knowledge and experience of experts, reflecting uncertainty and incompletion of information in the evaluation process.(2)By comparing the weighted mutation rate of indicators, we are able to determine the ratio of importance degree of adjacent indicators. With this, we combine the experience of expert and data information of indicators reasonably, which not only makes the combination weights reflect the expert information (indicator sorting) and the data information (determination of the ratio of indicator importance degree by the weighted mutation rate), but also avoids the optional chose of determining the important ratio of indicators and the thorny issue of how to distribute the objective and subjective weights.(3)Wide applicability of weighting method is another advantage. The combination weighting method has extensive applicability. The only requirement is that there must be quantified indicators data. Calculating the weighted mutation rate directly during the calculation, the combination weights can be obtained by integrating ideal sorting according to the importance sorting given by evaluator.

#### 4. Evaluation Equation

The comprehensive evaluation score could be obtained by weighted summation of quantified scores of indicators in Section 2.2 and combination weights in Section 3.2.2. Set : as the evaluated score of the th evaluated object, as the incompletion G1 method combination weights of the th indicator of the th evaluated object, and as the quantified score of the th indicator of the th evaluated object; then the evaluation equation was as shown in

#### 5. Application of Comprehensive Evaluation of Software Quality

##### 5.1. Principle of Evaluation of Software Quality

As shown in Figure 1, each subcharacteristic of software quality includes a number of quality indicators, namely, the quality of subcharacteristic is decided by the quality of indicators it contains. So the quality of subcharacteristic can be obtained from the quality of comprehensive indicators. Therefore, based on the quality of indicators and the corresponding weights, the quality of subcharacteristic and characteristic could be evaluated through fuzzy comprehensive evaluation method.

On the basis of constructing the index system, this paper researched fuzzy comprehensive evaluation method based on weighted mutation rate correction incompletion G1 combination weights. First, the value of data items in the index system was from the actual software development and testing process, which was the basis for quantitative evaluation. After that, we substituted the data item into the formula of membership degrees and calculated fuzzy quality and quantified score of indicators. The index quality was not only drawn on expert experience, but also based on the actual measurement data which was more credible. Secondly, as to possible different situations of experts and obtained objective quantitative data, this paper put forward a weighting method integrating subjective and objective evaluation based on weighted mutation rate correction incompletion G1 combination weights. Finally, according to the relationship between the subcharacteristic of software quality and indicators, we employed fuzzy comprehensive evaluation method to evaluate the quality of subcharacteristic, in which way we could obtain the characteristic quality and overall quality of software. Its comprehensive evaluation schematic diagram was shown in Figure 2.

##### 5.2. Process of Evaluation of Software Quality

The following took suitability, the subcharacteristic of functionality as the example to introduce how to evaluate the subcharacteristic by fuzzy comprehensive evaluation method. The subcharacteristic suitability contains four indicators [8]: functional attainment ratio, functional specification change ratio, precise input-output definition ratio, and project documentation ratio, signified with , , , respectively. Set the factor set and judgment set . Assume that represented the quantified quality of the th indicator of the th evaluated object and represented the fuzzy of the th indicator of the th evaluated object; there are three evaluated software, , , .

###### 5.2.1. Fuzzy Quantified Score of Software Quality

For each indicator of the factors set , evaluated by the method in Section 2.2, tested and substituted into the membership degree formula (1), we obtained the fuzzy quality of each indicator of the factor set separately, , , , , , , , , , , , , then substituted into formula (3), obtained the matrix of quantized score:

###### 5.2.2. Solution of Combination Weights

* 1* (determination of ideal sorting of indicator importance degree). If we invited 9 experts, of which three are complete index order relations of expert, three are order relations of expert missing one item, another three are order relations of expert missing two items, signified with , , and , respectively, expert sorting of category was , , ; expert sorting of category was , , ; and expert sorting of category was , , from Section 3.2.4, we knew category was regarded as valid as complete index sorting; category existed indicator omission but all indicators satisfied the correction condition, so retained; category did not satisfied the correction condition, should be discarded.

Substituted category into formulas (12) and (13) and obtain the mean value of each indicator:

Contrast the mean value with the incomplete information of expert sorting score of category, respectively, insert the deletion indicator into the expert incomplete sorting by size, and contrast the weighted mutation rate if the same size, and all index order relations of each expert were as follows:

Calculate the Spearman rank correlation coefficient of one expert with the other two experts and obtain , , and , since 0.8 is greater than the critical value 0.7, the correction sorting of expert and expert passed the consistency testing and the correction sorting of expert was discarded. Conduct Spearman rank correlation coefficient test on these two correction sortings that passed the consistency testing and three complete expert sortings and obtain the mean value Using the formulas (12) and (13) to calculate the mean value of expert and expert passed the consistency testing and expert : , , , , sorted according to the size, then the overall sorting result was , from Theorem 1, we knew it was the ideal sorting.

* 2* (calculation of weighted mutation rate of index). We used formula (8) to calculate the weighted mutation rate of quantitative score matrix of indicators in Section 5.2.1:

* 3* (calculation of importance degree ratio of indicators). We used formula (7) to calculate the value of weighted mutation rate of adjacent indicators:

* 4* (determination of index weights). We substituted into formulas (4) and (5) and obtained the index weight vector .

###### 5.2.3. Comprehensive Evaluation Result

Substitute the score matrix and the weight vector into the evaluation formula (20), obtain overall evaluation result of each software , , and , and from we knew that the subcharacteristic suitability of functionality of software was optimal, followed by , .

For the quality evaluation of each software characteristic, we used the same quality evaluation method as subcharacteristic. First evaluated all the subcharacteristics and then used the fuzzy comprehensive evaluation method based on weighted mutation rate correction incompletion G1 combination weights to evaluate its characteristic.

#### 6. Conclusions

This paper considered the situation that each expert may have different individual evaluation set, and expert evaluation was expressed with uncertain information, by which expert could express individual subjective judgment flexibly, integrated the evaluation results of experts and obtained the ideal sorting, then determined the ratio of indicators by weighted mutation rate, making the index weights include subjective opinions and objective test data properly. After that, we calculated the fuzzy quality and quantified score of indicators by the construction of membership degree in fuzzy mathematics and then obtained comprehensive evaluation result by weighted summation with combination weights. This method is a combined evaluation method including quantitative and qualitative method, which could take full advantage of expert knowledge and experience and objective data obtained by testing, conforms to the characteristic of poor visibility and difficulty in measure; the evaluation process is also in compliance with human thinking judgment process and shows good flexibility, effectiveness, and rationality.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgment

This work was supported by the National Natural Science Foundation of China (no. 71073056).