Research Article

Effective and Fast Near Duplicate Detection via Signature-Based Compression Metrics

Table 3

SigNCD versus the baselines on Chinese Finance News.

Algorithms Prec.Rec.Runtime (ms)

SigNCD w/ P1 0.980.970.98970
NCD 0.970.880.92154487
SpotSigNCD w/ P1 0.980.800.887187
SpotSigs 0.970.900.938823
Google’s simhash 0.990.940.9613151
SL+ST 0.940.610.748353481