#### Abstract

Gathered data is frequently not in a numerical form allowing immediate appliance of the quantitative mathematical-statistical methods. In this paper are some basic aspects examining how quantitative-based statistical methodology can be utilized in the analysis of qualitative data sets. The transformation of qualitative data into numeric values is considered as the entrance point to quantitative analysis. Concurrently related publications and impacts of scale transformations are discussed. Subsequently, it is shown how correlation coefficients are usable in conjunction with data aggregation constrains to construct relationship modelling matrices. For illustration, a case study is referenced at which ordinal type ordered qualitative survey answers are allocated to process defining procedures as aggregation levels. Finally options about measuring the adherence of the gathered empirical data to such kind of derived aggregation models are introduced and a statistically based reliability check approach to evaluate the reliability of the chosen model specification is outlined.

#### 1. Introduction

In this paper some aspects are discussed how data of qualitative category type, often gathered via questionnaires and surveys, can be transformed into appropriate numerical values to enable the full spectrum of quantitative mathematical-statistical analysis methodology. Therefore the impacts of the chosen valuation-transformation from ordinal scales to interval scales and their relations to statistical and measurement modelling are studied. This is applied to demonstrate ways to measure adherence of quantitative data representation to qualitative aggregation assessments-based on statistical modelling. Finally an approach to evaluate such adherence models is introduced. Concurrent a brief epitome of related publications is given and examples from a case study are referenced.

Gathering data is referencing a data typology of two basic modes of inquiry consequently associated with “qualitative” and “quantitative” survey results. Thereby “quantitative” is looked at to be a response given directly as a numeric value and “qualitative” is a nonnumeric answer. This differentiation has its roots within the social sciences and research. A brief comparison of this typology is given in [1, 2]. A refinement by adding the predicates “objective” and “subjective” is introduced in [3]. An elaboration of the method usage in social science and psychology is presented in [4]. A precis on the qualitative type can be found in [5] and for the quantitative type in [6]. A comprehensive book about the qualitative methodology in social science and research is [7]. Since both of these methodic approaches have advantages on their own it is an ongoing effort to bridge the gap between, to merge, or to integrate them. Following [8], the conversion or transformation from qualitative data into quantitative data is called “quantizing” and the converse from quantitative to qualitative is named “qualitizing”. The research on mixed method designs evolved within the last decade starting with analysis of a very basic approach like using sample counts as quantitative base, a strict differentiation of applying quantitative methods to quantitative data and qualitative methods to qualitative data, and a significant loose of context information if qualitative data (e.g., verbal or visual data) are converted into a numerically representation with a single meaning only [9].

A well-known model in social science is “triangulation” which is applying both methodic approaches independently and having finally a combined interpretation result. The main mathematical-statistical method applied thereby is cluster-analysis [10]. Model types with gradual differences in methodic approaches from classical statistical hypothesis testing to complex triangulation modelling are collected in [11]. Recently, it is recognized that mixed methods designs can provide pragmatic advantages in exploring complex research questions. However, the analytic process of analyzing, coding, and integrating unstructured with structured data by applying quantizing qualitative data can be a complex, time consuming, and expensive process. In [12], Driscoll et al. are presenting an example with simple statistical measures associated to strictly different response categories whereby the sample size issue at quantizing is also sketched.

A way of linking qualitative and quantitative results mathematically can be found in [13]. There are fuzzy logic-based transformations examined to gain insights from one aspect type over the other. Also in mathematical modeling, qualitative and quantitative concepts are utilized. In terms of decision theory [14], Gascon examined properties and constraints to timelines with LTL (linear temporal logic) categorizing qualitative as likewise nondeterministic structural, for example, cyclic, and quantitative as a numerically expressible identity relation. The object of special interest thereby is a symbolic representation of a -valuation with denoting the set of integers. A symbolic representation defines an equivalence relation between -valuations and contains all the relevant information to evaluate constraints. This might be interpreted as a hint that quantizing qualitative surveys may not necessarily reduce the information content in an inappropriate manner if a valuation similar to a -valuation is utilized. In [15] Herzberg explores the relationship between propositional model theory and social decision making via premise-based procedures. Condensed it is exposed that certain ultrafilters, which in the context of social choice are decisive coalitions, are in a one-to-one correspondence to certain kinds of judgment aggregation functions constructed as ultra-products. A special result is a “Impossibility theorem for finite electorates” on judgment aggregation functions, that is, if the population is endowed with some measure-theoretic or topological structure, there exists a single overall consistent aggregation.

#### 2. Interlock Qualitative and Quantitative Concepts

##### 2.1. From Quantitative Results to Qualitative Insights

Fuzzy logic-based transformations are not the only examined options to qualitizing in literature. The transformation from quantitative measures into qualitative assessments of software systems via judgment functions is studied in [16]. Based on Dempster-Shafer belief functions, certain objects from the realm of the mathematical theory of evidence [17], Kłopotek and Wierzchon. utilized exemplified decision tables as a (probability) measure of diversity in relational data bases. The authors viewed the Dempster-Shafer belief functions as a subjective uncertainty measure, a kind of generalization of Bayesian theory of subjective probability and showed a correspondence to the join operator of the relational database theory. This rough set-based representation of belief function operators led then to a nonquantitative interpretation. As a more direct approach the net balance statistic as the percentage of respondents replying “up” less the percentage replying “down” is utilized in [18] as a qualitative yardstick to indicate the direction (up, same or down) and size (small or large) of the year-on-year percentage change of corresponding quantitative data of a particular activity.

The following real life-based example demonstrates how misleading pure counting-based tendency interpretation might be and how important a valid choice of parametrization appears to be especially if an evolution over time has to be considered.

*Example 1 (A Misleading Interpretation of Pure Counts). *The situation and the case study-based on the following: projects () are requested to answer to an ordinal scaled survey about alignment and adherence to a specified procedural-based process framework in a self-assessment. Then the ( = 104) survey questions are worked through with a project external reviewer in an “initial review”. Based on these review results improvement recommendations are given to the project team. After a certain period of time a follow-up review was performed. So three samples available: self-assessment, initial review and follow-up sample. In case of such timeline depending data gathering the cumulated overall counts according to the scale values are useful to calculate approximation slopes and allow some insight about how the overall projects behavior evolves. Now we take a look at the pure counts of changes from self-assessment to initial review which turned out to be 5% of total count and from the initial review to the follow-up with 12,5% changed. Misleading is now the interpretation that the effect of the follow-up is greater than the initial review effect. Obviously the follow-up is not independent of the initial review since recommendations are given previously from initial review. A better effectiveness comparison is provided through the usage of statistically relevant expressions like the variance. For the self-assessment the answer variance was 6,3(%), for the initial review 5,4(%) and for the follow-up 5,2(%). This leads to the relative effectiveness rates shown in Table 1.

A variance-expression is the one-dimensional parameter of choice for such an effectiveness rating since it is a deviation measure on the examined subject-matter. The mean (or median or mode) values of alignment are not as applicable as the variances since they are too subjective at the self-assessment, and with high probability the follow-up means are expected to increase because of the outlined improvement recommendations given at the initial review.

Thereby, the empirical unbiased question-variance is calculated from the survey results with as the th answer to question and the according expected single question means , that is, In contrast to the one-dimensional full sample mean which is identical to the summing of the single question means , is not identical to the unbiased empirical full sample variance Also it is not identical to the expected answer mean variance where by the answer variance at the th question is It is a qualitative decision to use triggered by the intention to gain insights of the overall answer behavior. The full sample variance might be useful at analysis of single project answers, in the context of question comparison and for a detailed analysis of the specified single question. So is useful to evaluate the applied compliance and valuation criteria or to determine a predefined review focus scope. In fact, to enable such a kind of statistical analysis it is needed to have the data available as, respectively, transformed into, an appropriate numerical coding.

##### 2.2. Transforming Qualitative Data for Quantitative Analysis

The research and appliance of quantitative methods to qualitative data has a long tradition. Due to [19] is the method of “Equal-Appearing Interval Scaling”. Essentially this is to choose a representative statement (e.g., to create a survey) out of each group of statements formed from a set of statements related to an attitude using the median value of the single statements as grouping criteria. A single statement's median is thereby calculated from the “favourableness” on a given scale assigned to the statement towards the attitude by a group of judging evaluators. A link with an example can be found at [20] (Thurstone Scaling). Also the technique of correspondence analyses, for instance, goes back to research in the 40th of the last century for a compendium about the history see Gower [21]. Correspondence analysis is known also under different synonyms like optimal scaling, reciprocal averaging, quantification method (Japan) or homogeneity analysis, and so forth [22] Young references to correspondence analysis and canonical decomposition (synonyms: parallel factor analysis or alternating least squares) as theoretical and methodological cornerstones for quantitative analysis of qualitative data. The great efficiency of applying principal component analysis at nominal scaling is shown in [23]. There is given a nice example of an analysis of business communication in the light of negotiation probability. The authors introduced a five-stage approach with transforming a qualitative categorization into a quantitative interpretation (material sourcing—transcription—unitization—categorization—nominal coding). The issues related to timeline reflecting longitudinal organization of data, exemplified in case of life history are of special interest in [24]. Thereby so-called Self-Organizing Maps (SOMs) are utilized. SOMs are a technique of data visualization accomplishing a reduction of data dimensions and displaying similarities. The authors consider SOMs as a nonlinear generalization of principal component analysis to deduce a quantitative encoding by applying life history clustering algorithm-based on the Euclidean distance (-dimensional vectors in Euclidian space) Belief functions, to a certain degree a linkage between relation, modelling and factor analysis, are studied in [25]. The authors used them to generate numeric judgments with nonnumeric inputs in the development of approximate reasoning systems utilized as a practical interface between the users and a decision support system. Another way to apply probabilities to qualitative information is given by the so-called “Knowledge Tracking (KT)” methodology as described in [26]. Thereby the idea is to determine relations in qualitative data to get a conceptual transformation and to allocate transition probabilities accordingly. Thus the emerging cluster network sequences are captured with a numerical score (“goodness of fit score”) which expresses how well a relational structure explains the data. Since such a listing of numerical scores can be ordered by the lower-less (≤) relation KT is providing an ordinal scaling. Limitations of ordinal scaling at clustering of qualitative data from the perspective of phenomenological analysis are discussed in [27].

#### 3. Scaling

It is a well-known fact that the parametrical statistical methods, for example, ANOVA (Analysis of Variance), need to have some kinds of standardization at the gathered data to enable the comparable usage and determination of relevant statistical parameters like mean, variance, correlation, and other distribution describing characteristics. A survey about conceptual data gathering strategies and context constrains can be found in [28]. One of the basics thereby is the underlying scale assigned to the gathered data. The main types of numerically (real number) expressed scales are(i)nominal scale, for example, gender coding like “male = 0” and “female = 1”,(ii)ordinal scale, for example, ranks, its difference to a nominal scale is that the numeric coding implies, respectively, reflects, an (intentional) ordering (≤),(iii)interval scale, an ordinal scale with well-defined differences, for example, temperature in °C,(iv)ratio scale, an interval scale with true zero point, for example, temperature in °K,(v)absolute scale, a ratio scale with (absolute) prefixed unit size, for example, inhabitants.

Let us first look at the difference between a ratio and an interval scale: the true or absolute zero point enables statements like “20°K is twice as warm/hot than 10°K” to make sense while the same statement for 20°C and 10°C holds relative to the °C-scale only but not “absolute” since 293,15°K is not twice as “hot” as 283,15°K. Interval scales allow valid statements like: let temperature on day A = 25°C, on day B = 15°C, and on day C = 20°C. Now the ratio (A−B)/(A−C) = 2 validates “The temperature difference between day A and B is twice as much as between day A and day C”.

As mentioned in the previous sections, nominal scale clustering allows nonparametric methods or already (distribution free) principal component analysis likewise approaches. Examples of nominal and ordinal scaling are provided in [29]. A distinction of ordinal scales into ranks and scores is outlined in [30]. While ranks just provide an ordering relative to the other items under consideration only, scores are enabling a more precise idea of distance and can have an independent meaning. In case that a score in fact has an independent meaning, that is, meaningful usability not only in case of the items observed but by an independently defined difference, then a score provides an interval scale. An ordering is called strict if and only if “” holds.

*Example 2 (Rank to score to interval scale). *Let us evaluate the response behavior of an IT-system. The evaluation answers ranked according to a qualitative ordinal judgement scale aredeficient (failed) acceptable (partial) comfortable (compliant).Now let us assign “acceptance points” to construct a score of “weighted ranking”:deficient = acceptable = comfortable = .This gives an idea of (subjective) distance: 5 points needed to reach “acceptable” from “deficient” and further 3 points to reach “comfortable”. But from an interpretational point of view, an interval scale should fulfill that the five points from “deficient” to “acceptable” are in fact 5/3 of the three points from “acceptable” to “comfortable” (well-defined) and that the same score is applicable at other IT-systems too (independency). Therefore consider, as “throughput” measure, time savings:“deficient” = loosing more than one minute = −1,“acceptable” = between loosing one minute and gaining one = 0,“comfortable” = gaining more than one minute = 1.For a fully well-defined situation, assume context constrains so that not more than two minutes can be gained or lost. So from “deficient” to “comfortable”, the distance will always be “two minutes”.

##### 3.1. Transforming Ordinal Scales into Interval Scales

Lemma 1. *Each strict score with finite index set can be bijectively transformed into an order preserving ranking with .*

*Proof. *Since the index set is finite is a valid representation of the index set and the strict ordering provides to be the minimal scoring value with if and only if . Thus is the desired mapping.

Aside of the rather abstract “”, there is a calculus of the weighted ranking with and which is order preserving and since for all it provides the desired (natural) ranking . Of course qualitative expressions might permit two or more items to occupy equal rank in an ordered listing but with assigning numeric values differentiation aspects are lost if different items represented by the same numeral. Approaches to transform (survey) responses expressed by (non metric) judges on an ordinal scale to an interval (or synonymously “continuous”) scale to enable statistical methods to perform quantitative multivariate analysis are presented in [31]. Thereby a transformation-based on the decomposition into orthogonal polynomials (derived from certain matrix products) is introduced which is applicable if equally spaced integer valued scores, so-called natural scores, are used. Also the principal transformation approaches proposed from psychophysical theory with the original intensity as judge evaluation are mentioned there.

*Fechner's law*

with constant l in .

*Steven's Power Law*

where depends on the number of units and is a measure of the rate of growth of perceived intensity as a function of stimulus intensity.

*The Beidler Model*

with constant usually close to 1.

Thereby the determination of the constants or that the original ordering is lost occurs to be problematic. From lemma1 on the other-hand we see that given a strict ranking of ordinal values only, additional (qualitative context) constrains might need to be considered when assigning a numeric representation. Of course each such condition will introduce tendencies. So without further calibration requirements it follows:

Consequence 1. *An equidistant interval scaling which is symmetric and centralized with respect to expected scale mean is minimizing dispersion and skewness effects of the scale.*

*Proof. *If , let . Since
and the symmetry condition holds for each , there exist an with . Thus the centralized second momentum reduces to
and the third, since , to

*Remark 1. *For , the symmetry condition (for there is an with ) reduces the centralized second momentum to
In case of a strict score even to

*Example 3. *Let us return to the samples of Example 1. The predefined answer options are “fully compliant ()”, “partial compliant ()”, “failed ()”, and “not applicable ()”. In fact it turns out that the participants add a fifth namely, “no answer” = “blank”. A qualitative view gives since should be neither positive nor negative in impact whereas indicates a high probability of negative impact. The interpretation of “no answer” tends to be rather nearby than at —“not considered” is rather “failed” than a sound judgment. Finally to assume “blank” or “blank” is a qualitative (context) decision. So due to the odd number of values the scaling, , , , “blank” , and may hold. In sense of a qualitative interpretation, a 0-1 (nominal) only answer option does not support the valuation mean () as an answer option and might be considered as a class predifferentiator rather than as a reliable detail analysis base input.

Corollary 1. *Each (strict) ranking , and so each score, can be consistently mapped into via .*

*Proof. *Clearly
is strictly monotone increasing since − and it gives . Furthermore, and Var() = for the variance under linear shows the consistent mapping of -ranges.

*Remark 2. *Generally such target mapping interval transformations can be viewed as a “microscope effect” especially if the inverse mapping from [] into a larger interval is considered. Qualitative interpretations of the occurring values have to be done carefully since it is not a representation on a ratio or absolute scale. Similar magnifying effects are achievable by applying power or root functions to values out of interval [].

*Remark 3. *The values out of [] associated to (ordinal) rank are not the probabilities of occurrence. Let us look again at Examples 1 and 3. Each sample event is mapped onto a value (; here ). Thus each with depending on (). Let denote the total number of occurrence of and let the full sample with . Then the (empirical) probability of occurrence of is expressed by . In fact
Analog with as the total of occurrence at the sample block of question ,
And thus it gives as the expected mean of

*Remark 4. *The essential empiric mean equation “” is nicely outlining the intended weighting through the actual occurrence of the value but also that even a weak symmetry condition only, like , might already cause an inappropriate bias.

#### 4. Analysis Modelling

The key to analysis approaches in spite of determining areas of potential improvements is an appropriate underlying model providing reasonable theoretical results which are compared and put into relation to the measured empirical input data.

##### 4.1. Remarks on the Reliability of Statistical Result Analysis

Perhaps the most frequent assumptions mentioned when applying mathematical statistics to data are the Normal distribution (Gauß' bell curve) assumption and the (stochastic) independency assumption of the data sample (for elementary statistics see, e.g., [32]). For both a -test can be utilized. The Normal-distribution assumption is also coupled with the sample size. If the sample size is huge enough the central limit theorem allows assuming Normal-distribution or at smaller sizes a Kolmogoroff-Smirnoff test may apply or an appropriate variation. The Normal-distribution assumption is utilized as a base for applicability of most of the statistical hypothesis tests to gain reliable statements. The independency assumption is typically utilized to ensure that the calculated estimation values are usable to reflect the underlying situation in an unbiased way. The -independency testing is realized with contingency tables. In any case it is essential to be aware about the relevant testing objective.

What are we looking for being normally distributed in Example 1 and why? A quite direct answer is “looking for the distribution of the answer values to be used in statistical analysis methods”. Thereby the marginal mean values of the questions as well as the marginal mean values of the surveys in the sample are showing up as the overall mean value (cf. (2)).

Applying a Kolmogoroff-Smirnoff test at the marginal means forces the selected scoring values to pass a validity check with the tests allocated -significance level. That is, if the Normal-distribution hypothesis cannot be supported on significance level , the chosen valuation might be interpreted as inappropriate. Briefly the maximum difference of the marginal means cumulated ranking weight (at descending ordering the [total number of ranks minus actual rank] divided by total number of ranks) and their expected result should be small enough, for example, for lower than 1,36/*√* and for lower than 1,63/*√*. For = 104 this evolves to (rounded) 0,13, respectively, 0,16 (). In case of Example 3 and initial reviews the maximum difference appears to be . Thus for = 0,01 the Normal-distribution hypothesis is acceptable. In case of switching and “blank”, it shows 0,09 as calculated maximum difference. In case of , , , and and “blank” not counted, the maximum difference is 0,29 and so the Normal-distribution hypothesis has to be rejected for and , that is, neither an inappropriate rejection of 5% nor of 1% of normally distributed sample cases allows the general assumption of Normal-distribution hypothesis in this case. The same test results show up for the case study with the -type marginal means ( = 37). The symmetry of the Normal-distribution and that the interval [] contains ~68% of observed values are allowing a special kind of “*quick check*”: “if exceeds the sample values at all, the Normal-distribution hypothesis should be rejected.” which appears in the case study at the and “blank” not counted case. The statistical independency of random variables ensures that calculated characteristic parameters (e.g., unbiased estimators) allow a significant and valid interpretation. To apply -independency testing with ()() degrees of freedom, a contingency table with counting the common occurrence of observed characteristic out of index set and out of index set is utilized and as test statistic ( indicates a marginal sum; )
So on significance level the independency assumption has to be rejected if (; ()()) for the () quantile of the -distribution.

Looking at the case study the colloquial “the answers to the questionnaire should be given independently” needs to be stated more precisely. Of course independency can be checked for the gathered data project by project as well as for the answers by appropriate -tests. In case of the project by project level the independency of project and project responses can be checked with as the count of “answers with value “” at project and answer value “” at project B”. Thus is that independency telling us that one project is not giving an answer because another project has given a specific answer. So a distinction and separation of timeline given repeated data gathering from within the same project is recommendable.

In case of the answers in-between relationship, it is neither a priori intended nor expected to have the questions and their results always statistically independent, especially not if they are related to the same superior procedural process grouping or aggregation. It is even more of interest how strong and deep a relationship or dependency might be. In case of normally distributed random variables it is a well-known fact that independency is equivalent to being uncorrelated (e.g., [32]). Thereby, the (Pearson-) correlation coefficient of and is defined through with , as the standard deviation of , respectively. and as their covariance Thus it allows also a “quick check/litmus test” for independency: “if the (empirical) correlation coefficient exceeds a certain value the independency hypothesis should be rejected”. Of course there are also exact tests available for , for example, for : from a -distribution test statistic or from the normal distribution with as the real value [32].

##### 4.2. Extended Modelling with Correlation Coefficients

In sense of our case study, the straight forward interpretation of the answer correlation coefficients—note that we are not taking the Spearman's rho here—allows us to identify questions within the survey being potentially obsolete () or contrary (). The expressed measure of linear dependency is pointing out overlapping areas () or potential conflicts (). Aside of this straight forward usage, correlation coefficients are also a subject of contemporary research especially at principal component analysis (PCA); for example, as earlier mentioned in [23] or at the analysis of hebbian artificial neural network architectures whereby the correlation matrix' eigenvectors associated with a given stochastic vector are of special interest [33]. In conjunction with the -significance level of the coefficients testing, some additional meta-modelling variables may apply. So the absolute value of recognized correlation coefficients may have to exceed a defined lower limit before taken into account; aggregation within specified value ranges of the coefficients may be represented by the ranges mean values; the signing as such may be ignored or combinations of these options are possible.

At least in situations with a predefined questionnaire, like in the case study, the single questions are intentionally assigned to a higher level of aggregation concept, that is, not only PCA will provide grouping aspects but there is also a predefined intentional relationship definition existing. In our case study, these are the procedures of the process framework. Such (qualitative) predefined relationships are typically showing up the following two quantifiable construction parameters: (i)a weighting function outlining the “relevance” or “weight” of the lower level object, relative within the higher level aggregate,(ii)the number of allowed low to high level allocations.

For example, such an initial relationship indicator matrix for procedures () given per row and the allocated questions as columns with constant weight , interpreted as fully adhered to the indicated allocation, and with a (directed) 1 : 1 question-procedure relation, as a “primary main procedure allocation” for the questions, will give, if ordered appropriate, a somewhat diagonal block relation structure: An approach to receive value from both views is a model combining the (experts) presumable indicated weighted relation matrix with the empirically determined PCA relevant correlation coefficients matrix .

Such a scheme is described by the linear aggregation modelling of the form Additional to the meta-modelling variables “magnitude and validity of correlation coefficients” and applying “value range means representation” to the matrix multiplication result, a normalization transformationappears to be expedient. Some obvious but relative normalization transformations are disputable: (1) This might be interpreted that the will be 100% relevant to aggregate in row but there is no reason to assume in case of that the column object being less than 100% relevant to aggregate which happens if the maximum in row is greater than . (2) Also the transformation is indeed keeping the relative portion within the aggregates and might be interpreted as 100% coverage of the row aggregate through the column objects but it assumes collaterally disjunct coverage by the column objects too. (3) Most appropriate in usage and similar to eigenvector representation in PCA is the normalization via the (Euclidean) length , that is, in relation to the aggregation object and the row vector , the transformation yields, since the length of the resulting row vector equals 1, a 100% interpretation coverage of aggregate , providing the relative portions and allowing conjunctive input of the column defining objects.

Now with as the unit-matrix and , we can assume with standard error as the aggregation level built-up statistical distribution model (e.g., questionsprocedures). In terms of the case study, the aggregation to procedure level built-up model-based on given answer results is expressible as (see (24) and (25)) Notice that with transformation applied and since implies it holds Recall will be a natural result if the underlying scaling is from within []. If appropriate, for example, for reporting reason, might be transformed according or according to Corollary 1. Also notice that matches with the common PCA modelling base. With as an eigenvector associated with eigen-value of an idealized heuristic ansatz to measure consilience results in So under these terms the “difference” of the model compared to a PCA model is depending on (). This points into the direction that a predefined indicator matrix aggregation equivalent to a more strict diagonal block structure scheme might compare better to a PCA empirically derived grouping model than otherwise (cf. also topological ultra-filters in [15]). This is comprehensible because of the orthogonality of the eigenvectors but there is not necessarily a component-by-component disjunction required. In fact a straight forward interpretation of the correlations might be useful but for practical purpose and from practitioners view a referencing of only maximal aggregation level is not always desirable. So for evaluation purpose ultrafilters, multilevel PCA sequence aggregations (e.g., in terms of the case study: PCA on questions to determine procedures—PCA on procedures to determine processes—PCA on processes to determine domains, etc.) or too broadly-based predefined aggregation might avoid the desired granularity for analysis.

##### 4.3. Adherence of Gathered Data to Aggregation Model

In order to answer how well observed data will adhere to the specified aggregation model it is feasible to calculate the aberration as a function induced by the empirical data and the theoretical prediction. Formally expressed through Thereby the adherence() to a single aggregation form ( in ) is of interest. So let whereby is the calculation result of a comparison of the aggregation represented by the th row-vector of and the effect triggered by the observed . Notice that in the notion of the case study is considered and equals “everything is fully compliant” with no aberration and holds. So options of are given through (1) compared to and adherence formula: but this can be formally only valid if and have the same sign since the theoretical min () = 0 expresses already fully incompliance. (2)Let * denote a component-by-component multiplication so that = . Similary as in (30) an adherence measure-based on disparity (in sense of a “length” compare) is provided by Notice that gives .(3)An azimuth measure of the angle between and The orientation of the vectors in the underlying vector space, that is, simply spoken “if a vector is on the “left” or “right” side of the other”, does not matter in sense of adherence measurement and is finally evaluated by an examination analysis of the single components characteristics. In addition the constrain max () = 1, that is, full adherence, has to be considered too. So let us specify under assumption and with as a consequence from scaling values out of []: Recall that the following generally holds Thus for we get Since and are independent from the length of the examined vectors, we might apply and . So let . And since holds, which is shown by thus evolves to Simultaneous appliance of and will give a kind of cross “check & balance” to validate and complement each other as adherence metric and measurement.

The relevant areas to identify high and low adherence results are defined by not being inside the interval (mean ± standard deviation). An interpretation as an expression of percentage or prespecified fulfillment goals are doubtful for all metrics without further calibration specification other than 100% equals fully adherent and 0% is totally incompliant (cf., Remark 2). The same high-low classification of value-ranges might apply to the set of the . But the interpretation of a is more to express the observed weight of an aggregate within the full set of aggregates than to be a compliance measure of fulfilling an explicit aggregation definition.

##### 4.4. Model Evaluation

In contrast to the model inherit characteristic adherence measure, the aim of model evaluation is to provide a valuation base from an outside perspective onto the chosen modelling. In [34] Müller and Supatgiat described an iterative optimisation approach to evaluate compliance and/or compliance inspection cost applied to an already given effectiveness-model (indicator matrix) of “measures/influencing factors” determining “(legal regulatory) requirements/classes” as aggregates. It was also mentioned by the authors there that it took some hours of computing time to calculate a result. In fact the situation to determine an “optimised” aggregation model is even more complex. Let us recall the defining modelling parameters:(i)the definition of the applied scale and the associated scaling values, (ii)relevance variables of the correlation coefficients ( constant & -level),(iii)the definition of the relationship indicator matrix ,(iv)entry value range adjustments applied to .

Instead of a straight forward calculation, a measure of congruence alignment suggests a possible solution. Therefore, the observation result vectors and will be compared with the modeling inherit expected theoretical estimated values derived from the model matrix . Under the assumption that the modeling is reflecting the observed situation sufficiently the appropriate localization and variability parameters should be congruent in some way. That is, the appliance of a well-defined value transformation will provide the possibility for statistical tests to decide if the observed and the theoretic outcomes can be viewed as samples from within the same population.

Let be the observed values and representing the uniquely transformed values.

Analog the theoretic model estimating values are expressed as ( transposed”) Now the relevant statistical parameter values are The evaluation is now carried out by performing statistical significance testing for with the corresponding hypothesis. Of course thereby the probability (1-) under which the hypothesis is valid is of interest. So not a test result to a given significance level is to be calculated but the minimal (or percentile) under which the hypothesis still holds. The appropriate test statistics on the means (, ) are according to a (two-tailed) Student's -distribution and on the variances () according to a Fisher's -distribution. For practical purpose the desired probabilities are ascertainable, for example, with spreadsheet program built-in functions “TTEST” and “FTEST” (e.g., Microsoft Excel, IBM Lotus Symphony, SUN Open Office). Reasonable varying of the defining modelling parameters will therefore provide -test and -test results for the direct observation data () and for the aggregation objects (). The ultimate goal is that all probabilities are tending towards 1. But this is quite unrealistic and a decision of accepting a model set-up has to take surrounding qualitative perspectives too. So it might occur that an improved concordance at the aggregates is coupled with a decrease of a probability value at the observation data side or any other uncomfortable situation depending on which of the defining variables is changed. Especially the aspect to use the model theoretic results as a base for improvement recommendations regarding aggregate adherence requires a well-balanced adjustment and an overall rating at a satisfactory level. As a rule of thumb a well-fitting localizing -test value at the observed data is considerable more valuable than the associated -test value since a correct predicted mean looks more important to reflect coincidence of the model with reality than a prediction of the spread of individual triggered responses. A little bit different is the situation for the aggregates level. Since the aggregates are artificially to a certain degree the focus of the model may be at explaining the variance rather than at the average localization determination but with a tendency for both values at a similar magnitude. As an illustration of input/outcome variety the following changing variables value sets applied to the case study data may be considered to shape on a potential decision issue(- and -test values with = Question, = aggregating procedure):(i)a (specified) matrix with entries either 0 or 1; is resulting in: (ii) as above but with entries “1” substituted from ; and the entries of consolidated at margin and range means :

#### 5. Conclusions and Future Research

The need to evaluate available information and data is increasing permanently in modern times. Thereby more and more qualitative data resources like survey responses are utilized. Therefore a methodic approach is needed which consistently transforms qualitative contents into a quantitative form and enables the appliance of formal mathematical and statistical methodology to gain reliable interpretations and insights which can be used for sound decisions and which is bridging qualitative and quantitative concepts combined with analysis capability. The desired avoidance of methodic processing gaps requires a continuous and careful embodiment of the influencing variables and underlying examination questions from the mapping of qualitative statements onto numbers to the point of establishing formal aggregation models which allow quantitative-based qualitative assertions and insights.

In this paper are mathematical prerequisites depicted and statistical methodology applied to address and investigate on this issue. In particular the transformation from ordinal scaling to interval scaling is shown to be optimal if equidistant and symmetric. Alternative to principal component analysis an extended modelling to describe aggregation level models of the observation results-based on the matrix of correlation coefficients and a predefined qualitative motivated relationship incidence matrix is introduced. On such models are adherence measurements and metrics defined and examined which are usable to describe how well the observation fulfills and supports the aggregates definitions. Finally a method combining - and -tests to derive a decision criteria on the fitting of the chosen aggregation model is presented. This appears to be required because the multiple modelling influencing parameters are not resulting in an analytically usable closed formula to calculate an optimal aggregation model solution. Part of these meta-model variables of the mathematical modelling are the scaling range with a rather arbitrarily zero-point, preselection limits on the correlation coefficients values and on their statistical significance relevance-level, the predefined aggregates incidence matrix and normalization constraints. If some key assumption from statistical analysis theory are fulfilled, like normal distribution and independency of the analysed data, a quantitative aggregate adherence calculation is enabled. Therefore two measurement metrics namely a dispersion (or length) measurement and a azimuth(or angle) measurement are established to express quantitatively the qualitative aggregation assessments.

In fact the quantifying method applied to data is essential for the analysis and modelling process whenever observed data has to be analyzed with quantitative methods. The presented modelling approach is relatively easy implementable especially whilst considering expert-based preaggregation compared to PCA.

An important usage area of the extended modelling and the adherence measurement is to gain insights into the performance behaviour related to the not directly evaluable aggregates or category definitions. In the case study this approach and the results have been useful in outlining tendencies and details to identify focus areas of improvement and well performing process procedures as the examined higher level categories and their extrapolation into the future.

As a continuation on the studied subject a qualitative interpretations of , a refinement of the - and -test combination methodology and a deep analysis of the Eigen-space characteristics of the presented extended modelling compared to PCA results are conceivable, perhaps in adjunction with estimating questions.

#### Acknowledgments

The author would like to acknowledge the IBM IGA Germany EPG for the case study raw data and the IBM IGA Germany and Beta Test Side management for the given support. The author also likes to thank the reviewer(s) for pointing out some additional bibliographic sources.