Big Data Market Optimization Pricing Model Based on Data Quality
Table 1
Metric definitions, description, and calculation.
Attributes
Metric
Description
Variables
Formula
Accuracy
Proportion of accurate cells
Indicate the proportion cells in a data source that has correct value according to the domain and the type of information of the data source.
nce: Number of cells with errors ncl: Number of cells
pac=1-
Completeness
Proportion of complete cells
Indicate the proportion of complete cells in a dataset. It means the cells that are not empty and have a meaningful value assigned (i.e., a value coherent with the domain of the column).
nr: Number of rows nc: Number of columns ic: Number of incomplete cells ncl: Number of cells
pcc=1-
Redundancy
Proportion of duplicate records
Redundancy expresses the proportion of duplicate records in the data source. Since this factor is the cost-indicator, we convert it to the benefit-indicator.
nr: Number of rows red: Number of duplicate records