#### Abstract

The complementary judgment matrix (CJM) method is an MCDA (multicriteria decision aiding) method based on pairwise comparisons. As in AHP, the decision-maker (DM) can specify his/her preferences using pairwise comparisons, both between different criteria and between different alternatives with respect to each criterion. The DM specifies his/her preferences by allocating two nonnegative comparison values so that their sum is 1. We measure and pinpoint possible inconsistency by* inconsistency errors*. We also compare the consistency of CJM and AHP trough simulation. Because preference judgments are always more or less imprecise or uncertain, we introduce a way to represent the uncertainty through stochastic distributions, and a computational method to treat the uncertainty. As in Stochastic Multicriteria Acceptability Analysis (SMAA), we consider different uncertainty levels: precise comparisons, imprecise comparisons with a stochastic distribution, and missing comparisons between criteria. We compute rank acceptability indices for the alternatives, describing the probability of an alternative to obtain a given rank considering the level of uncertainty and study the influence of the uncertainty on the SMAA-CJM results.

#### 1. Introduction

The complementary judgment matrix (CJM) method is an MCDA (multicriteria decision aiding) method based on hierarchical decomposition of the decision criteria into subcriteria, evaluation of preferences using pairwise comparisons, and aggregating the results into an overall evaluation of the alternatives. The earliest publications about the CJM method are by Lin and Xu [1], Su [2], and Dong et al. [3]. As in AHP [4], the criteria form a hierarchical tree-like structure, where the root node is the overall decision problem, and the branches at each node correspond to criteria or subcriteria. The leaf nodes correspond to the different decision alternatives. Figure 1 illustrates a school selection problem as an example.

At each node of the hierarchy, the decision-maker (DM) performs pairwise comparisons between each pair of criteria or subcriteria. At the bottom level, the DM is asked to compare each pair of alternatives with respect to each criterion. Thus, the DM evaluates through pairwise comparisons both the relative importance of different criteria, and the performance of each alternative with respect to these criteria.

The CJM method differs from AHP mainly in five aspects: the pairwise comparisons are expressed differently, a different numerical scale is used to represent the verbal preference statements, a computationally simpler and more intelligible procedure is used to aggregate the comparisons, the set of comparisons can be incomplete, and individual inconsistent comparisons can be automatically detected. In the CJM method, the DM assigns to each pair of compared entities (*i*,* j*) nonnegative weights , and that are* complementary*, i.e., + = 1. When comparing criteria, the ratios of these weights correspond to trade-off ratios. We should point out that some variants of the CJM method have a different interpretation for . For example Wang and Guo [5] treat crisp values as fuzzy membership function values for the expression that ‘*i* is more important than* j*’. Such an interpretation is different from our assumption that are related to trade-off ratios. Fuzzy techniques have been applied also with the AHP method [6–11].

Subjective comparison values are always more or less uncertain or imprecise. In particular in group decision-making, it may be difficult to represent the preferences of multiple DMs by precise comparison values. Fuzzy set theory has been employed to cope with the uncertainty and vagueness involved in conducting the comparisons between components of a decision model [12, 13]. Some previous works have also treated such imprecision either using interval numbers or fuzzy numbers [1, 14–17]. Alternatively, uncertain or imprecise information in MCDA can be represented by stochastic variables and Monte-Carlo analysis [18, 19]. Durbach et al. [20] developed uncertainty modelling techniques in AHP as extension of Stochastic Multicriteria Acceptability Analysis (SMAA). We present in this paper a way to represent such imprecise or uncertain preference information in the CJM method through stochastic distributions and a computational method to treat this information in the analysis. We do this by introducing a new variant of SMAA [21–23] that applies the decision model of the CJM method. As in SMAA, we compute various descriptive measures for the problem. SMAA is based on simulating uncertain criteria evaluations and preferences and collecting statistics on the performance of each alternative. The DMs are given* rank acceptability indices* for each alternative, describing the variety of different preferences that support an alternative for the best rank or any particular rank. This information can be used for classifying the alternatives into more or less acceptable ones and those that are not acceptable at all.* Pairwise winning indices* describe the probability of one alternative to be more preferred than another. SMAA also computes* central weights* describing typical trade-off weights between criteria that make an alternative the most preferred one. It is also possible to measure with* confidence factors* whether the performance of alternatives has been assessed accurately enough for decision-making.

This paper is organized as follows. Section 2 presents the CJM method with some extensions. Section 3 presents the new SMAA-CJM method. Section 4 demonstrates the method using a small example. Section 5 compares the consistency of preference statements expressed using the CJM and AHP scales through simulation. This is followed by discussion and conclusions.

#### 2. The Complementary Judgment Matrix Method

##### 2.1. Expressing Judgments

At each node of the criteria hierarchy (see Figure 1), the DM performs pairwise comparisons between each pair of criteria or subcriteria to express their relative importance. At the bottom level, the DM is asked to compare the relative performance of each pair of alternatives with respect to each (sub-) criterion. The DM expresses the intensity of his/her preference either by choosing among verbal preference statements, or by giving directly the complementary positive weights and such thatThe ratio of the complementary weights corresponds to trade-off ratios, i.e., where and are the trade-off weights for the* i*th and* j*th criterion, respectively.

Any number of preference levels can be used in CJM, but to allow comparison with AHP, we use nine levels as in AHP. Table 1 presents the verbal preference statements and corresponding CJM weights in per cent. Equal preference is represented by the 50/50 ratio. The strongest preference is represented by the 90/10 ratio. If the DM’s preference falls between the listed statements, the in-between weights can be used. The scale has uniform step size.

At each node of the hierarchy, the comparison values are organized into a* complementary judgment matrix*:

##### 2.2. Comparison with AHP

For comparison, the AHP comparison values are represented in the second column of Table 1. In AHP the comparison values represent trade-off ratios:To allow comparing the CJM scale with the AHP scale, we set the left-hand-side (LHS) of (2) equal to LHS of (4) to obtain . Then we solve from (1) and substitute this into the previous expression. This gives the transformation from AHP comparison values into corresponding CJM weights and vice versa:The last two columns in Table 1 show the standard AHP scale transformed into CJM weights and CJM weights transformed into AHP comparison values. Both scales are equivalent for the first and last preference statement. However, the CJM scale has smaller steps between the weaker preference statements than AHP and larger steps between the stronger preference statements. The CJM scale has earlier been applied in AHP by Salo and Hämäläinen [24], who named it the* balanced scale* for AHP. Observe that the verbal preference statements carry only ordinal information, while the numerical values try to represent corresponding judgements on cardinal (ratio) scales. In general, no fixed cardinal scale can represent accurately the subjective verbal preference statements of different DMs, because DMs have different interpretations on the verbal statements and their relative intensities.

##### 2.3. Solving the Weights

The complementary judgement matrix contains more information than necessary to determine the weights uniquely. The redundant information may serve for detecting inaccuracies and possible errors in the expressed preferences. If the judgment matrix is fully consistent, we can find weights that satisfy each equation (7) as equality. The matrix is consistent if = for each* i*,* j*,* k*. In practice, the matrix may contain some level of inconsistency. In this case we can solve the weights from (7) in the least squares (LSQ) sense.

Different techniques to solve the weights have been presented in literature. Here we present a technique that is a little simpler and computationally more efficient than the eigenvalue method of AHP. The eigenvalue method requires iterative calculation of the eigenvector while the LSQ solution is obtained in closed form. First, we solve from (1) and (2) obtainingThen we multiply (7) by (+) to obtain the linear system:In addition, to get a unique solution, we consider the normalization condition for the weights = 1. To solve the system we solve (arbitrarily) the last weight from the normalization equation:and substitute it into (8). This yields a linear equation system with (*n*-1) variables and equations. The system can be easily reduced. Firstly, the equations corresponding to* i *=* j* hold trivially and they can therefore be omitted. Secondly, due to symmetry, the error in the equation for is the complement to that for . Therefore it is necessary to consider only the equations corresponding to either the upper or lower triangle of . The resulting linear equation system with (*n*-1) variables and (-*n*)/2 constraints is of the following form:where . When* n* = 2, there is only a single equation and the consistent solution (=/H_{11}) is trivially found. When* n* ≥ 3, the system is overdetermined and the LSQ solution isRather than forming the matrix inverse explicitly, system (11) is solved efficiently by forming the Cholesky factorization of the symmetric matrix =** L** where** L** is a lower triangular matrix (see e.g., Stewart [25]). Then vector** x** is solved from the lower triangular system** Lx**=**b** and after that from the upper triangular system =**x**. After solving , the last weight is computed from (9).

Observe that this method of solving the weights does not require a complete set of comparisons. A sufficient requirement is that the graph formed by pairwise comparisons between entities is connected. This gives great flexibility for the DM in large problems, where comparing every pair of entities would be too laborious.

LSQ solution of the weights is also applicable with multiple DMs who provide their (precise) comparisons independently. All comparisons are then collected into a common linear equation system (8) from which the weights are solved. This approach finds weights that satisfy different preferences in the LSQ sense. Section 3 describes another way to handle the preferences of multiple DMs.

##### 2.4. Evaluating the Scores in the Hierarchy

After the weights have been computed at each node of the criteria hierarchy, a score is computed for each alternative . At the lowest level criteria nodes* t*, the* criterion score* for each alternative equals its weight. At the higher level nodes, the score for each alternative is computed as a weighted average of the scores at the lower level. Writing the node identifier as superscript for scores and weights, we havewhere* S*(*t*) refers to the set of subnodes of node in the hierarchy. The* overall score* for each alternative is the score computed at the top node.

##### 2.5. Measuring the Inconsistency

The redundant information provided by pairwise comparisons serves two purposes in the CJM method. Firstly, the weights solved from the overdetermined system (7)-(8) can provide more accurate preference information compared to the case where only a minimal number of comparisons are made. Secondly, large inconsistency may indicate that the DM has expressed his/her preferences incorrectly. The DM is encouraged to revise his/her comparisons if too large inconsistency is detected.

Xu [26] suggests that the AHP inconsistency ratio (IR) is computed also in the CJM method to detect excess inconsistency. Before this method can be applied, it is necessary to transform the CJM weights into the corresponding reciprocal matrix of AHP using (6). Then a consistency index CI = (-*n*)/(*n*-1) is computed, where is the principal eigenvalue and is the dimension of the reciprocal matrix. Finally, IR = CI/RI, where the random index RI is the average consistency index of a large number of random reciprocal matrices. If IR exceeds 10%, the DM is urged to revise his/her comparisons.

We suggest here a different technique for the CJM method. We simply compute the* inconsistency errors* in (7) based on expressed and weights from the LSQ solution:Observe that = -. If the absolute value is too large for any of the comparisons, the DM should reconsider his/her comparisons. The advantage of the inconsistency errors is that the DM can understand them easily, because they are directly related to his/her comparison values. Another advantage with inconsistency errors is that they can pinpoint comparisons that are most likely incorrect. If only one or a few comparisons are found too inconsistent, it may be sufficient that the DM only reconsiders these. If many comparisons are inconsistent, we suggest that the DM reconsiders all comparisons. The DM can specify a* threshold* for the inconsistency errors to identify too inconsistent comparisons based on his/her accuracy level when making the comparisons. We suggest ±0.1 as a reasonable threshold, because it corresponds to one step uncertainty on the verbal scale.

#### 3. The SMAA-CJM Method

Some restrictions of the basic CJM method are that it cannot treat imprecise information, and it does not explicitly support combining the preferences of multiple DMs. Some extensions of the CJM method exist for treating imprecise information as intervals [14] or fuzzy numbers [16]. In this paper, we extend the CJM method by representing the elements of the complementary judgment matrix as probability distributions.

The DMs can give their pairwise comparisons either as precise values or as intervals. The inconsistency errors are computed for each DM, and if they are too large, the DMs are allowed to revise their comparisons. In case of intervals, we suggest computing the inconsistency errors based on the midpoints of the intervals. We next combine the individual DMs’ pairwise comparisons into intervals where is the minimal value that any DM has expressed and is the maximal value. The aggregated comparison values are then represented by stochastic variables with a suitable probability distribution in the intervals. The complementary value pairs (, ) are treated as dependent distributions to make their sum 1. Technically, it is possible to use arbitrary distributions. However, in the absence of information about the distribution shape, we apply a uniform distribution in the interval. More complex distributions can be estimated based on preference information provided by a large number of DMs. If the interval is degenerate, i.e., =, we use Dirac’s delta function (the unit impulse function) as the distribution.

After representing the aggregated pairwise comparisons by suitable distributions, the performance of each alternative is analysed through stochastic simulation by drawing simultaneously pairwise comparisons from their corresponding distributions and computing the score for each alternative as in the CJM method. A sufficient number of simulation rounds is between 10 000 and 100 000 [27]. During the simulation, statistics is collected about the weights at different nodes of the hierarchy, the overall score of the alternatives, and their ranking. Based on the statistics, the following descriptive measures are computed for evaluating the alternatives.(i)*Average overall score *for different alternatives. This generalizes the crisp CJM overall score to consider imprecise comparison values.(ii)*Average criterion score* for different alternatives. This generalizes the corresponding crisp CJM criterion scores to consider imprecise comparison values.(iii)The* rank acceptability* index measures the variety of different preferences that grant alternative rank . The rank acceptability indices can be used for ranking the alternatives roughly, or for finding compromise alternatives in case no alternative obtains sufficient acceptability for the first rank. Potential compromise alternatives are those with high acceptability for the best ranks. Alternatives that obtain high acceptability for the worst ranks should be avoided [22].(iv)The first rank* acceptability index* measures the variety of different preferences that make alternative most preferred. In other words, the acceptability index measures how widely acceptable the alternative is. The acceptability index can be interpreted as the share of people voting for the alternative, assuming that the applied distribution for comparison values represents the voters’ preferences. Zero acceptability means that the alternative is inefficient, i.e., no preferences make it best [21].(v)The* pairwise winning index* is the probability for alternative to be more preferred than alternative . This index can be used to exclude alternatives that are dominated by others [28] and also for forming a stochastic ranking among the alternatives [29, 30].(vi)The* central weight vector* describes what kinds of weights are favourable for alternative* i*, i.e., make it most preferred. The central weights can be presented to the DMs in order to help them understand how different weights correspond to different choices with the assumed preference model. The central weights are undefined for inefficient alternatives [22].(vii)The* confidence factor * is the probability for alternative to be most preferred when the central weight vector for that alternative is selected. In other words, the confidence factor measures if the performance of the alternative has been assessed accurately enough, so that it can be selected under favourable preferences between criteria [22].

#### 4. Example

To illustrate the SMAA-CJM method, we consider the AHP problem for evaluating 3 high schools (A, B, C) in terms of 6 criteria (One,…, Six) [31]. First we evaluate the problem using precise comparisons in CJM and compare the results with AHP. Secondly, we evaluate the problem with smaller number of pairwise comparisons in CJM, thirdly by considering the comparisons as imprecise, and fourthly by assuming that comparison information between criteria is missing.

##### 4.1. Precise Comparisons

In the original AHP problem, the preferences were expressed verbally and mapped on the AHP scale (1, 2, …, 9). For CJM comparisons we use the uniform scale (50%, 55%, …, 90%) presented in Table 1. The resulting CJM comparisons for the problem are shown in Table 2 between the criteria and in Table 3 between the alternatives. To omit redundant information, only the upper triangle of each comparison matrix is presented, because the diagonal elements are equal to 0.5 and the lower triangle elements are equal to the complement of the upper triangle.

Solving the weights from the precise CJM comparisons gives the criterion scores, average weights and overall scores for alternatives shown in Table 4(a). Alternative A obtains the highest score 0.41 followed by C (0.30) and B (0.29). With precise information, the alternatives (A, B, C) obtain distinct ranks (1, 3, 2) deterministically. This is indicated by the rank acceptability indices = = = 100% and zero for the remaining indices as well as the pairwise winning indices = = = 100%. However, because subjective information from the DM is always uncertain and B and C obtain almost identical overall score, alternatives B and C could be considered equally good in practice.

Table 4(b) shows the corresponding results using standard AHP. We observe that the CJM results are somewhat different from AHP results. Alternative A obtains almost identical overall score and the best rank with both methods. However, alternatives B and C obtain different overall scores and reversed ranks. Also, the criterion scores and the criterion weights are quite different. The differences are mainly due to the different scales used to represent verbal preference statements in CJM and AHP. Transforming the comparisons on the CJM scale by (6) into AHP comparisons (CJM->AHP column in Table 1) and evaluating the model using AHP gives nearly identical results as the CJM method. This is natural, because both the LSQ solution and eigenvalue method give the same weights with consistent comparisons and, as we will see, in this example only small inconsistency is present.

Next we evaluate the consistency of the CJM comparisons in terms of the inconsistency errors () introduced in this paper and in terms of the inconsistency ratio (IR) of AHP. For the comparisons between criteria (Table 2) the maximal inconsistency error -0.14 occurs between criteria Two and Four (by formula (7) with =0.30, =8.3% and =10.7%). All other comparisons are well below the threshold of ±0.1. Therefore, we would suggest the DM to reconsider his/her comparisons between criteria and in particular the comparison between criteria Two and Four. The negative sign of the inconsistency error indicates that the expressed comparison value (30%) is smaller than the consistent value (44%). Instead of the preference statement ‘criterion Four is ‘moderately more important than Two’, a consistent statement would have been between ‘a little more important’ and ‘equally important’.

The IR for the comparisons between criteria is 0.02 (CI = 0.03, RI = 1.25) which is clearly below the suggested threshold 0.1 for sufficient consistency. Because IR is a kind of average measure for inconsistency, it is insensitive to a single inconsistent comparison and fails to detect the clearly inconsistent comparison. For related discussion, see Bana e Costa and Vansnick [32]. Also, the IR does not pinpoint the most likely sources of inconsistency. In the original AHP model the comparisons between criteria were slightly too inconsistent with CI=0.137, RI=1.24 and IR=0.109.

For the comparisons between alternatives (Table 3) all inconsistency errors are clearly below the suggested threshold, with the largest inconsistency error of -0.02 found at criterion One between alternatives A and C. Also the IRs are well below the consistency threshold (IR = 0.004, 0, 0, 0.00014, 0, and 0.0006, correspondingly).

##### 4.2. Smaller Number of Pairwise Comparisons

The disadvantage with performing the full set of pairwise comparisons between each pair of entities (alternatives or criteria) is that the number of comparisons increases quadratically by the number of compared entities. With compared entities, the number of pairwise comparisons is* n*(*n*-1)/2. When the number of compared entities is large, performing the full set comparisons is in practice infeasible due to the large cognitive load on the decision-maker. For example, Saaty and Ozdemir [33] suggest that the full set of comparisons with more than 7 entities inherently lead to inconsistency. Bozóki et al. [34] proved the increase of inconsistency empirically and showed that a subset of the comparisons can be used to approximate the results based on the full set of comparisons.

The LSQ method for solving the weights in CJM works also with a subset of pairwise comparisons, provided that the graph formed by pairwise comparisons between entities is connected. This means that for each pair of entities A, B, they are either compared directly, or there exists a path of comparisons connecting A and B via other entities. The minimal sufficient number of comparisons is* n*-1. Of course, in that case no redundant information is provided, equation system (8) has a unique solution, and the LSQ method is not required.

As a compromise between the maximal and minimal number of comparisons, we suggest (for problems with many entities) comparing each entity systematically only with a small number of other entities. In the following, we suggest two methods for reducing the number of comparisons.

Before making comparisons, the DM should first order the entities according to their importance or preference. Saaty [4] applied such ordering in an example, although he did not explicitly define the ordering as part of the AHP procedure. Ordering the entities simplifies making the pairwise comparisons, because the mutual order of each pair of entities has already been determined and only the (verbal or numerical) preference statement is required. We believe that this reduces the risk of mistakes in preference statements. Also, individual* ordinally inconsistent* comparisons are easy to spot immediately from the comparison matrix when it is ordered this way (see Section 5 and Xu et al. [35]).

*Method 1. *After ordering the entities, the DM compares each entity only with the two following entities. The necessary number of comparisons is then 2*n*-3. For example, with 6 entities, a total of 9 comparisons are required: 1&2, 1&3, 2&3, 2&4, 3&4, 3&5, 4&5, 4&6, 5&6. This method has the advantage that each entity is compared only with entities that are as similar as possible; no more than two places before or after itself. Comparisons between extremely different entities are avoided. This is good because it is difficult to express accurate comparisons between very different entities.

*Method 2. *After ordering the entities, the DM compares each entity only with the first and last entity. With 6 entities this method results in comparisons: 1&6, 1&2, 2&6, 1&3, 3&6, 1&4, 4&6, 1&5, 5&6. This method has the advantage that it reduces the DMs cognitive load in the comparisons because during the process he/she becomes ‘more familiar’ with the first and last entities, and at least one of them appears in every comparison.

We should point out that we are not suggesting any particular order in which the subset of comparisons are made. Bozóki et al. [34] came to the slightly surprising conclusion that the order in which the pairwise comparisons are made has no effect on the consistency. However, the order is not irrelevant, because the order of questions may affect the results through the anchoring bias.

We illustrate smaller sets of comparisons using the school selection problem. After ordering the criteria into importance order, the full set of comparisons is shown in Figure 2. Note that after ordering, all consistent comparisons in the upper triangle should be at least 50%. Also, consistent comparisons should satisfy ≤ along rows and ≤ along columns of the CJM. Some small violations of the latter conditions do appear in Figure 2.

The comparisons according to the first method appear on the bottom two diagonals and for the second method on the first row and last column of Figure 2. Table 5 shows the criterion weights and overall scores for alternatives using the full set of comparisons and using the subsets by the first and second method for reducing the number of comparisons. Because there is some inconsistency in the comparisons, it is natural that the weights differ depending on which subset of comparisons is included. The differences are quite small, maximally about 3% points. However, the importance order of weights is the same in all three cases. The overall scores for the alternatives are in practice identical using different sets of comparisons, resulting in the same recommendation:* Alternative A is best and B and C are in practice equally good*.

##### 4.3. Imprecise Comparisons

Next we introduce imprecision to the problem and analyse it using SMAA-CJM. We assume that the uncertainty of each comparison in Tables 2 and 3 is ±10% points and use a uniform distribution to represent this uncertainty. Solving the model with imprecision gives almost identical criterion scores, average weights, and overall scores for alternatives as with precise comparisons (Table 4(a)). However, the ranking of the alternatives becomes uncertain, as shown by the rank acceptability indices and pairwise winning indices in Figure 3.

**(a) Rank acceptability indices**

**(b) Pairwise winning indices**

Alternative A with 99.97% first rank acceptability is in practice the only candidate for the first rank while alternatives B and C obtain only 0.06% and 0.01% acceptability for the first rank. However, the second rank acceptability of B and C is now 42% and 58%, which shows clearly that we cannot be sure about which alternative is the second best one. The same conclusions can be made from the pairwise winning indices = 99.94%, = 99.99%, but alternatives B and C win each other with 42% and 58% probability, correspondingly. In this case the pairwise winning indices between B and C are almost identical to their second rank acceptability indices because A obtains almost always the first rank. The central weights for A nearly coincide with the average weights (bottom row of Table 4(a)) resulting into confidence factor = 100%. The confidence factors of B and C are 0.02%, which means that even considering the uncertainty of criteria preference information, these alternatives are in practice inefficient.

##### 4.4. Missing Comparisons between Criteria

We demonstrate next how the SMAA-CJM method can be used when no comparison information among criteria (Table 2) exists. We consider only the pairwise comparisons between alternatives with respect to different criteria (Table 3) and the associated ±10% point imprecision for the comparisons between alternatives with respect to each criterion. We represent missing preferences among the criteria by nonnegative normalized weights, ≥0, =1, that follow a uniform joint distribution.

The resulting criterion scores are identical with the previous analysis, because we have the same comparison and uncertainty information between alternatives. The uniform weight distribution results into average weights for each criterion equal to 1/6 ≈ 16.7%. The average overall scores for the alternatives A, B, C are 0.43, 0.26, and 0.30, correspondingly.

Figure 4 shows the resulting rank acceptability indices and pairwise winning indices. We observe that the increased uncertainty in the comparisons is reflected as increased uncertainty in the ranking. Now both B and C with 4.7% and 3.5% first rank acceptability could, at least in theory, be the best one under suitable preferences for criteria.

**(a) Rank acceptability indices**

**(b) Pairwise winning indices: School 1 versus 2**

Table 6 shows the central weights and confidence factors for the alternatives. The central weights identify what kind of trade-off weights between criteria make each alternative most preferred. We can see that different alternatives are favoured by dramatically different weights. For example, alternative B would require about 39% of the weight to be placed on criterion One alone. The confidence factor for B is 55%, which means that even with its central weights B will not be the best alternative with certainty. For C, the confidence factor is even lower, only 19%. This means that the criteria measurements are too uncertain to justify choosing C even with its central weights.

#### 5. Comparison of CJM and AHP Scales

The most significant differences between CJM and AHP results stem from the different scales used to represent verbal preference statements. Because the verbal preference statements carry only ordinal information and DMs have different interpretation of the intensities of the preference statements, no fixed numerical scale can properly represent the ordinal verbal comparisons. However, the integer scale of AHP from 1 to 9 is particularly problematic, because in many cases it is impossible to express consistent comparisons between three or more entities. For example, if criterion 1 is moderately more important than criterion 2 (=5) and criterion 2 is moderately more important than criterion 3 (=5), a consistent comparison between criteria 1 and 3 is impossible to express using the AHP scale (=25). This problem occurs partly because the AHP scale is too sparse for the weaker preference statements and too dense for stronger statements. A cascade of a few comparisons even with very weak preference values exceeds soon the strongest value. Because the CJM scale is better balanced, i.e., denser for weaker preferences and sparser for stronger preferences, we wanted to test if it performs better than the AHP scale. With the CJM scale, the above example results into CJM comparisons a_{12}=a_{23}=70% which correspond to ==2.33. The consistent comparison =5.44 corresponds to CJM comparison a_{13}=84% which is very close to scale value 85%.

To compare the two scales, we generated a large number of random* ordinally consistent* comparison matrices for different numbers of criteria. Ordinal consistency means the natural transitivity of preference statements that a consistent and logical DM should follow [35]. The transitivity of preference statements can be expressed as(i)If A is preferred to B by intensity and B is preferred to C by intensity , then A is preferred to C by intensity

Random ordinally consistent comparison matrices between entities and levels of preference intensity can be generated by generating first a set of random weights >,…,> and thresholds <<⋯< in range and then setting each comparison value equal to the smallest intensity such that -<. We note that the simulated weights satisfy the condition of order preservation (COP) with respect to the generated comparisons [32].

Table 7 shows the average IRs and inconsistency errors for 1000 randomly generated ordinally consistent comparison matrices using the AHP and CJM scales. Using the AHP scale, the IR is on average in the range , i.e., clearly above the suggested consistency threshold of 0.1. This means that when the DM is ordinally consistent, the AHP scale comparisons are on average cardinally inconsistent. For the CJM scale the average IR is in range , i.e., clearly consistent. When measured by the IR, the CJM scale gives from 2.6 to 2.9 times better consistency than the AHP scale.

In terms of the average inconsistency error the results are similar. The inconsistency error for each matrix is the maximal and Table 7 shows the average for each number of criteria. Using the AHP scale, the inconsistency error is in the range increasing with the number of criteria. This is natural, because with larger number of comparisons, the maximal error is likely to be larger. Except for the 3 criterion case, the inconsistency errors exceed the suggested threshold of 0.1 clearly. Using the CJM scale, the inconsistency errors are in the range exceeding the suggested threshold with 5 or more criteria. Also when measured by the inconsistency error, the CJM scale gives clearly better consistency than the AHP scale: from 1.4 to 1.8 times better.

We conclude that although not perfect, the more balanced CJM scale is clearly better than the AHP scale in its ability to represent the cardinal preferences of an ordinally consistent DM. Similar results were previously obtained by Pöyhönen et al. [36] who compared the two scales empirically using a group of students.

#### 6. Discussion

The comparison values of CJM have a natural interpretation. Considering only two criteria at a time, the DM can interpret the comparison values as trade-off weights that he/she assigns to the criteria. The DM can express these weights either as a normalized complementary pair (e.g., 0.8, 0.2) or as a pair of nonnegative numbers (e.g., 4, 1) that are normalized to satisfy the complementarity condition. Similarly, when comparing two alternatives with respect to a criterion, the DM is in effect distributing partial value between the two alternatives. Of course, it is also possible to evaluate the performance of alternatives through other techniques and to use pairwise comparisons only for assessing criteria weights. For example, criteria measured on natural scales can be normalized to partial values in range . This makes the CJM method conformant with linear value theory.

No fixed cardinal scale can represent precisely the verbal preference statements of different DMs. Instead, each DM could define their own cardinal scale that represents his/her verbal statements. Alternatively, DMs could express their preferences cardinally in the first place. In practice this may be difficult for many DMs.

Another approach for cardinalizing ordinal preference statements is based on ordinal regression, as in UTA [37], MACBETH [38], [39], and GRIP [40] methods. These methods use verbal preference and indifference statements between pairs of alternatives or criteria to assess constraints on the parameters of an additive value function.

#### 7. Conclusions

We have introduced the SMAA-CJM method for representing uncertain or imprecise information through stochastic distributions in the Complementary Judgment Matrix method and a simulation approach for analysing the resulting model. A particular strength of the method is that it allows flexible modelling of different kinds of imprecision, uncertainty, or even partially missing preference information. This is useful in decision processes, where the information is gradually refined during the process. The method is also suitable for group decision-making problems, where it is difficult for DMs to agree on precise pairwise comparisons. The method allows using distributions that include each DM’s preferences. Alternatively, the weight solution method of CJM can find weights that match different DMs’ preferences as well as possible in the LSQ sense.

We also introduced the* inconsistency error* as a measure for how consistent each comparison is, i.e., how much each comparison value differs from the consistent value. These measures are easy for the DMs to understand, because they are directly related to their comparison values. Another advantage of inconsistency errors is that they identify comparisons that are most likely incorrect. A reasonable threshold for the inconsistency errors is ±0.1, corresponding to one step uncertainty on the verbal scale. Also this threshold is easy for the DMs to understand.

We conducted simulation experiments using a large number of different sized (3,…,8 criteria) ordinally consistent comparison matrices. The results showed that the balanced comparison scale of CJM results in more consistent results than the standard AHP scale. The consistency was better in terms of both the inconsistency ratio (IR) of AHP and the inconsistency error of SMAA-CJM. An earlier empirical study with students gave similar results.

#### Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper. The funds received did not lead to any conflicts of interest.

#### Acknowledgments

This work was supported by the China National Key Research and Development Program-China-Finland Intergovernmental Cooperation in Science and Technology Innovation (Funding no. 2016YFE0114500) and Academy of Finland Funding (Grant no. 299186). The authors would also like to acknowledge the ‘Xinghai’ Talent Project of Dalian University of Technology.