Research Article

Big Data Market Optimization Pricing Model Based on Data Quality

Table 1

Metric definitions, description, and calculation.

AttributesMetricDescriptionVariablesFormula

AccuracyProportion of accurate cellsIndicate the proportion cells in a data source that has correct value according to the domain and the type of information of the data source.nce: Number of cells with errors
ncl: Number of cells
pac=1-
CompletenessProportion of complete cellsIndicate the proportion of complete cells in a dataset. It means the cells that are not empty and have a meaningful value assigned (i.e., a value coherent with the domain of the column).nr: Number of rows
nc: Number of columns
ic: Number of incomplete cells
ncl: Number of cells
pcc=1-
RedundancyProportion of duplicate recordsRedundancy expresses the proportion of duplicate records in the data source. Since this factor is the cost-indicator, we convert it to the benefit-indicator.nr: Number of rows
red: Number of duplicate records
pdc=1-